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Introduction 

The  34th  Annual  Sanibel  Symposium,  organized  by  the  faculty  and  staff  of  the 
Quantum  Theory  Project  of  the  University  of  Florida,  was  held  on  February  12- 
19,  1994,  at  the  Marriott,  Sawgrass  Resort,  Ponte  Vedra  Beach,  Florida.  Over  300 
participants  gathered  for  8  days  of  lectures  and  informal  discussions. 

The  format  of  the  symposium  adopted  for  the  past  few  years  was  followed  again 
this  year  with  a  compact  8-day  schedule  with  an  integrated  program  of  quantum 
biology,  quantum  chemistry,  and  condensed  matter  physics.  The  topics  of  the 
sessions  covered  by  these  proceedings  include  Quantum  Chemistry  of  Biological 
Molecules,  Spectroscopic  Signatures  of  Biological  Molecules,  Protein  Folding,  and 
Photosynthesis. 

The  articles  have  been  subjected  to  the  ordinary  refereeing  procedures  of  the  Inter¬ 
national  Journal  of  Quantum  Chemistry.  The  articles  presented  in  the  sessions  on 
quantum  chemistry,  condensed  matter  physics,  and  associated  poster  sessions  are  pub¬ 
lished  in  a  separate  volume  of  the  International  Journal  of  Quantum  Chemistry. 

The  organizers  acknowledge  the  following  sponsors  for  their  support  of  the  1994 
Sanibel  Symposium: 

•  The  Office  of  Naval  Research  through  Grant  NOOO 14-93- 1-0343 

“This  work  relates  to  Department  of  Navy  Grant  N00014-93-1-0343  issued 
by  the  Office  of  Naval  Research.  The  United  States  Government  has  a  royalty- 
free  license  throughout  the  world  in  all  copyrightable  material  contained 
herein.” 

•  U.S.  Army  Research  Office  (Physics )/CRDEC  and  U.S.  Army  Edgewood  RD&E 
Center  through  Grant  DA AH04-94-G-00 1 5 

“The  views,  opinions,  and/or  findings  contained  in  this  report  are  those  of 
the  author(s)  and  should  not  be  construed  as  an  official  Department  of  the 
Army  position,  policy,  or  decision,  unless  so  designated  by  other  documen¬ 
tation.” 

•  U.S.  Department  of  Energy  through  Grant  DE-FG05-94ER61785 

•  International  Science  Foundation 

•  CAChe 

•  IBM 

•  International  Society  of  Quantum  Biology  and  Pharmacology 

•  Silicon  Graphics 

•  Sun  Microsystems 

•  The  University  of  Florida. 

Very  special  thanks  go  to  the  staff  of  the  Quantum  Theory  Project  of  the  University 
of  Florida  for  handling  the  numerous  administrative,  clerical,  and  practical  details. 
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The  organizers  are  proud  to  recognize  the  contributions  of  Mrs.  Judy  Parker,  Ms. 
Leann  Golemo,  Mrs.  Karen  Yanke,  Ms.  Sandra  Weakland,  Mr.  Sullivan  Beck,  Dr. 
Agustin  Diz,  and  Dr.  Erik  Deumens.  All  the  graduate  students  of  the  Quantum 
Theory  Project  who  served  as  “gofers”  are  gratefully  recognized  for  their  contri¬ 
butions  to  the  1994  Sanibel  Symposium. 


N.  Y.  Ohrn 
J.  R.  Sabin 
M.  C.  Zerner 


Postulates  for  Protein  (Hydrophobic) 
Folding  and  Function 


DAN  W.  URRY 

Laboratory  of  Molecular  Biophysics,  School  of  Medicine,  The  University  of  Alabama  at  Birmingham, 
VH  300,  Birmingham,  Alabama  35294-0019 


Abstract 

The  previously  demonstrated  capacity  to  utilize  the  hydrophobic  folding  and  assembly  transition  in 
designed  model  proteins  to  perform  diverse  energy  conversions  is  formalized  in  terms  of  three  postulates 
with  an  associated  1 5  corollaries.  Each  corollary  defines  one  of  the  1 5  pairwise  free-energy  transductions 
involving  the  six  intensive  variables:  mechanical  force,  temperature,  pressure,  chemical  potential,  elec¬ 
trochemical  potential,  and  electromagnetic  radiation.  The  first  postulate  directly  involves  the  input  of 
thermal  energy  to  raise  the  temperature  from  below  to  above  that  temperature  required  to  drive  hydro- 
phobic  folding  and  assembly,  with  the  resultant  capacity  to  perform  useful  mechanical  work.  The  second 
postulate  considers  the  energy  inputs  that  can  lower  or  raise  the  temperature  range  over  which  the 
hydrophobic  folding  transition  occurs;  these  energy  inputs  can  thereby  perform  mechanical  work.  The 
third  postulate  treats  the  energy  conversions  not  involving  mechanical  force,  whereby  a  pair  of  functional 
groups  becomes  coupled  by  each  being  a  part  of  the  same  hydrophobic  association  process  with  each 
functional  group  being  able  individually  to  drive  the  hydrophobic  folding  transition  and  thereby  to 
change  the  state  of  the  second  functional  group  in  an  energy-conversion  process.  It  is  then  shown  how 
a  model  protein  can  be  so  designed  as  to  function  in  a  second-order  process  treated  by  Postulate  III  and 
how  the  efficiency  of  energy  conversion  can  be  enhanced.  Finally,  these  energy-conversion  studies  using 
the  hydrophobic  folding  and  assembly  transition  in  model  proteins  are  related  to  a  theoretical  model  for 
cooperative  hydrophobic  folding  and  to  experimental  studies  suggesting  a  central  role  for  hydrophobic 
folding  in  the  general  problem  of  protein  folding.  ©  1994  John  Wiley  &  Sons,  Inc. 

Protein  Folding  and  Function 

It  is  an  aphorism  that  understanding  protein  folding  is  essential  to  understanding 
protein  function.  Although,  commonly,  protein  function  has  been  considered  in 
terms  of  active  sites  and  the  breaking  and  forming  of  bonds  therein,  much  of  protein 
function  in  living  organisms  can  be  viewed  in  terms  of  the  conversion  of  energy 
from  an  available  form  to  a  more  directly  useful  form.  Indeed,  this  is  a  key  step  in 
the  evolution  of  living  organisms,  and  it  can  be  achieved  by  controlling  protein 
folding.  Most  obviously,  light  and  chemical  stores  such  as  starches  and  fats  are 
recognized  as  sources  of  energy  for  living  organisms  and,  of  course,  there  is  the 
universal  immediate  chemical  energy  source,  adenosine  triphosphate  (ATP). 

A  construct  that  can  convert  energy  from  one  form  to  another  can  be  called  a 
machine.  The  molecular  constructs,  proteins,  could  therefore  be  reasonably  called 
molecular  machines.  Those  particular  molecular  machines  that  can  convert  energy 
into  useful  mechanical  motion  could  be  called  molecular  engines.  The  actin  and 
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myosin  molecular  construct  of  muscle  could  be  considered  a  molecular  engine  as 
this  molecular  construct  gives  rise  to  useful  mechanical  motion  on  using  ATP  as 
an  energy  source.  But  there  are  many  other  energies  at  play  in  living  organisms. 
There  are  some  six  forms  of  free  energy  involved  in  the  functioning  of  proteins  in 
living  organisms,  the  intensive  variables  of  which  are  mechanical  force,  temperature, 
pressure,  chemical  potential,  electrochemical  potential,  and  electromagnetic  radia¬ 
tion,  and  proteins  are  the  molecular  machines  that  perform  these  energy  conversions. 
Interestingly,  as  will  be  presented  below,  hydrophobic  folding  of  proteins,  seen  in 
terms  of  certain  model  proteins,  appears  to  be  an  element  of  folding  that  can  perform 
the  15  pairwise  free-energy  transductions  possible  given  the  above  six  intensive 
variables  of  the  free  energy  [1]. 

Postulates  for  Protein-Catalyzed  Energy  Conversion  Using  Hydrophobic  Folding 

The  following  analysis  has  an  underlying  hypothesis  that  all  the  energy  conversion 
functions  of  proteins  can  be  achieved  by  controlling  hydrophobic  folding  and  as¬ 
sembly.  This  hypothesis  has  as  its  origin  in  the  literature  a  paper  presented  at  and 
occurring  in  the  Proceedings  of  the  1987  Sanibel  Symposium  [2] .  This  hypothesis 
can  now  be  stated  in  terms  of  three  postulates  and  an  associated  15  corollaries. 
These  are  listed  in  Table  I. 

Model  Proteins  as  First-Order  Molecular  Machines  of  the  Tt  Type: 

Molecular  Engines 

Hydrophobic  folding  and  assembly  of  amphiphilic  polymers  like  proteins  are 
such  that  at  sufficiently  low  temperature  in  water  the  hydrophobic  moieties  are 
surrounded  by  a  water  of  hydrophobic  hydration.  Thermodynamically,  this  water 
of  hydrophobic  hydration  is  of  low  entropy  and  is  exothermic  in  its  formation  [  3- 
10],  and,  structurally,  it  is  characterized  by  water  molecules  arranged  at  the  comers 
of  pentagons  with  some  1 2  such  pentagons  able  to  enclose  a  hydrophobic  entity 
such  as  a  methane  molecule  [11,12]. 

Depending  on  the  particular  composition  of  a  protein-based  polymer  construct 
containing  both  hydrophobic  and  polar  components,  as  the  temperature  is  raised 
through  a  discrete  transition  temperature  range,  the  pentagonally  arranged  waters 
of  hydrophobic  hydration  destructure  in  an  endothermic  reaction  as  the  protein 
part  of  the  system  folds  and  associates  using  the  now  available  hydrophobic  intra¬ 
molecular  and  intermolecular  contacts  [9,10,12-16].  This  is  referred  to  as  an  inverse 
temperature  transition  because  the  protein  part  of  the  system  increases  order  with 
increase  in  temperature  through  the  transition  temperature  range,  the  onset  tem¬ 
perature  for  which  is  designated  as  Tt  [1,17]. 

Postulate  /.  Using  elastic  model  proteins,  also  referred  to  as  elastic  protein-based 
polymers  of  the  composition  poly[^(VPGVG),^c(VPGXG)],  where  jx  and  fi, 
are  mole  fractions,  with  fy  +  fx  =  1,  it  has  been  possible  to  observe  this  inverse 
temperature  transition  process  in  wholly  synthetic  systems  by  7-irradiation  cross- 
linking  to  form  macroscopic  elastic  matrices.  At  temperatures  lower  than  Tt ,  the 
elastic  matrix  is  swollen;  on  raising  the  temperature  above  Tt  through  the  temper- 
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Table  I.  Postulates  for  energy  conversion  by  means  of  hydrophobic  folding  and  assembly  (inverse 

temperature  transitions). 


Postulate  I:  The  input  of  thermal  energy  to  a  protein  capable  of  hydrophobic  folding  and  assembly  on 
raising  the  temperature  from  below  to  above  the  temperature,  T„  of  an  inverse  temperature  tran¬ 
sition  can  result  in  motion  and  the  performance  of  mechanical  work 

Corollary:  Thermomechanical  transduction 

Postulate  II:  Any  energy  input  that  changes  the  temperature,  Tt,  at  which  an  inverse  temperature 
transition  occurs  can  be  used  to  produce  motion  and  perform  mechanical  work 

Corollary  1:  Chemomechanical  transduction 
Corollary  2:  Electromechanical  transduction 
Corollary  3:  Baromechanical  transduction 
Corollary  4:  Photomechanical  transduction 

Postulate  III:  Different  energy  inputs,  each  of  which  can  individually  drive  hydrophobic  folding  to 
produce  motion  and  perform  mechanical  work,  can  be  converted  one  into  the  other  (transduced) 
by  means  of  the  inverse  temperature  transition  with  the  correctly  designed  coupling  and  Tt  value 

Corollary  1:  Electrochemical  transduction 
Corollary  2:  Electrothermal  transduction 
Corollary  3:  Baroelectrical  transduction 
Corollary  4:  Photovoltaic  transduction 
Corollary  5:  Thermochemical  transduction 
Corollary  6:  Photothermal  transduction 
Corollary  7:  Barothermal  transduction 
Corollary  8:  Barochemical  transduction 
Corollary  9:  Photobaric  transduction 
Corollary  10:  Photochemical  transduction 


ature  range  of  the  inverse  temperature  transition,  the  elastic  matrix  contracts  to 
less  than  one-half  its  swollen  dimension  [18,19].  When  a  weight  is  hung  on  the 
swollen  elastic  matrix  and  the  temperature  is  raised  above  Tt ,  the  matrix  contracts 
and  lifts  the  weight.  Thus,  this  construct  that  hydrophobically  folds  and  assembles 
on  raising  the  temperature  is  a  molecular  machine  capable  of  performing  useful 
mechanical  motion;  it  is  a  molecular  engine.  Thermal  energy  has  been  converted 
into  mechanical  work.  The  process  is  thermomechanical  transduction. 

A  Tt-based  Hydrophobicity  Scale.  By  systematically  varying  X  in  poly- 
[yv(VPGVG),^(VPGXG)],  a  unique  hydrophobicity  scale  has  been  developed 
that  is  based  directly  on  the  hydrophobic  folding  process  of  interest.  The  value  of 
Tt  for  poly(VPGVG),  i.e.,  for /v  =  1,  is  25°C.  When  X  =  Ile(I)  and/x  =  1  as  in 
poly(VPGIG),  the  addition  of  the  one  CH2  moiety  per  pentamer  lowers  the  value 
of  Tt  some  15°C.  When  X  =  Ala(A)  as  in  poly(VPGAG),  the  loss  of  two  CH2 
moieties  per  pentamer  raises  the  value  of  Tt  some  30°C.  A  plot  offx  vs.  Tt  is  given 
in  Figure  1  for  poly[^(VPGVG),^x(VPGXG)]  in  phosphate  buffered  saline  and 
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Temperature  of  Inverse  Temperature  Transition,  Tt 
for  poly[/v  (VPGVG),/X  (VPGXG)] 
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Figure  1.  Plot  of  the  mole  fraction,^,  of  pentamers  containing  the  guest  residue  X  vs. 
the  temperature,  Tt,  for  the  onset  of  the  hydrophobic  folding  and  assembly  transition  for 
model  proteins  of  the  structure  poly  [jv(VPGVG),^(  VPGXG)],  where  fv  andj^c  are  the 
mole  fractions  with  fv  +fx  =  1  and  where  X  is  any  naturally  occurring  amino  acid  residue 
or  a  chemical  modification  thereof.  Reproduced  with  permission  from  [20]. 


the  values  of  Tt  extrapolated  Xofx  =  1  are  given  in  Table  II.  The  more  hydrophobic 
a  residue  is  the  lower  is  the  value  of  Tt  and  the  less  hydrophobic  the  higher  the 
value  of  T,  [1,20]. 

Postulate  II  and  the  ATt  Mechanism  of  Energy  Conversion.  Note  in  Table  II  and 
Figure  1  that  the  values  of  Tt  are  very  different  for  the  Glu(E)  residue  when  the 
carboxyl  side  chain  is  protonated  as  COOH  (where  Tt  =  30°C)  than  when  the  side 
chain  is  ionized  as  for  COO-  (where  Tt  =  250°C).  This  means  that  at  37°C  a  cross- 
linked  elastic  matrix  of  poly[0.8(VPGVG),0.2(VPGEG)]  will  be  contracted  at 
pH  3  and  swollen  at  neutral  pH.  Thus,  it  is  possible  to  attach  a  weight  to  the  swollen 
elastic  matrix  at  37  °C  and  pH  7  and  to  lower  the  pH  to  3  and  cause  contraction 
and  the  lifting  of  the  weight  [21].  This  involves  the  chemical  energy  input  of  raising 
the  proton  concentration  ( increasing  the  proton  chemical  potential )  with  the  re¬ 
sulting  performance  of  useful  mechanical  work.  This  is  chemomechanical  trans¬ 
duction  (Corollary  1  of  Table  I). 

It  may  also  be  noted  in  Table  II,  for  the  N-methyl  nicotinamide  moiety  attached 
to  a  lysine,  Lys(K),  side  chain  by  amide  linkage,  that  the  value  of  Tt  is  120°C 
when  oxidized  but  —  130°C  when  reduced.  Thus,  it  is  possible  in  the  cross-linked 
protein-based  polymer  of  poly  [  0.73  ( VPGVG )  ,0.27  ( VPGK  { NMeN }  G )  ]  at  37  °C 
to  attach  a  weight  to  the  swollen  ( oxidized )  elastic  matrix  and  to  reduce  the  nico¬ 
tinamide  to  drive  contraction  and  perform  useful  mechanical  work  [1].  This  can 
be  the  input  of  electrical  energy  with  the  result  of  useful  mechanical  work;  it  is 
electromechanical  transduction  (Corollary  2  of  Table  I). 
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Table  II.  7>based  hydrophobicity  scale  for  proteins  Tt  =  temperature  of  inverse  temperature 
transition  for  poly[/v(VPGVG),  /x(VPGXG)].a 


Residue  X 

T„  linearly  extrapolated 

to  A  =  i  (°C) 

Correlation  coefficient 

Lys(NMeN,  reduced) b 

-130 

1.000 

Trp 

(W) 

-90 

.993 

Tyr 

(Y) 

-55 

.999 

Phe 

(F) 

-30 

.999 

His  (pH  8) 

(H°) 

-10 

1.000 

Pro 

(P)C 

(-8) 

Calculated 

Leu 

(L) 

5 

.999 

lie 

(I) 

10 

.999 

Met 

(M) 

20 

.996 

Val 

(V) 

24 

Reference 

Glu(COOCH3) 

(Em) 

25 

1.000 

Glu(COOH) 

(E°) 

30 

1.000 

Cys 

(C) 

30 

1.000 

His  (pH  4) 

(H+) 

30 

1.000 

Lys(NH2) 

(K°) 

35 

.936 

Pro 

(P)d 

40 

.950 

Asp(COOH) 

(D°) 

45 

.994 

Ala 

(A) 

45 

.997 

HyP 

50 

.998 

Asn 

(N) 

50 

.997 

Ser 

(S) 

50 

.997 

Thr 

(T) 

50 

.999 

Gly 

(G) 

55 

.999 

Arg 

(R) 

60 

1.000 

Gin 

(Q) 

60 

.999 

Lys(NH  1) 

(K+) 

120 

.999 

Tyr(<£-CT) 

(Y“) 

120 

.996 

Lys(NMeN,  oxidized) b 

120 

1.000 

Asp(COCT) 

(D-) 

170 

.999 

Glu(COO-) 

(E-) 

250 

1.000 

Ser(PC>4-) 

1000 

1.000 

a  Adapted  with  permission  from  [20]. 

b  NMeN  is  for  A-methyl  nicotinamide  pendant  on  a  lysyl  side  chain,  i.e.,  N-methyl  nicotinate  attached 
by  amide  linkage  to  the  eNH2  of  Lys  and  the  reduced  state  is  V-methyl-l,6-dihydronicotinamide. 

c  The  calculated  Tt  value  for  Pro  comes  from  poly(VPGVG)  when  the  experimental  values  of  Val  and 
Gly  are  used.  This  hydrophobicity  value  of  -8°C  is  unique  to  the  /3-spiral  structure  where  there  is 
hydrophobic  contact  between  the  Val'yCRj  and  Pro2/8CH2  moieties. 
dThe  experimental  value  determined  from  poly[.£(VPGVG),./p(PPGVG)]. 


It  has  also  been  found  that  the  application  of  pressure  particularly  when  X  is  an 
aromatic  residue  such  as  Phe(F),  Tyr(Y),  or  Trp(W)  will  raise  the  value  of  Tt. 
Accordingly,  in  the  appropriate  elastic  matrix  and  at  a  temperature  just  above  Tti 
the  application  of  pressure  will  cause  swelling  with  the  lowering  of  an  attached 
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weight  and  the  release  of  pressure  will  cause  a  contraction  with  the  lifting  of  the 
weight  [22].  This  is  baro-mechanical  transduction  (Corollary  3  of  Table  I). 

Most  recently,  it  has  been  demonstrated  that  when  azobenzene  is  attached  to  a 
Glu(E)  side  chain  the  appropriate  wavelength  of  light  will  reversibly  increase  the 
value  of  Tt  due  to  a  light-driven  trans  ->  cis  geometrical  isomerism,  where  the  cis 
state  is  less  hydrophobic  than  is  the  trans  state  [23].  This  provides  a  basis  for 
photomechanical  transduction  (Corollary  4  of  Table  I).  A  similar  effect  has  been 
found  when  cinnamic  acid  is  the  chromophore  attached  to  a  Lys(K)  residue 
(Heimbach  et  al.,  unpublished  data). 

The  phenomenological  basis  for  Postulate  II  is  that  a  properly  designed  protein- 
based  polymer  construct  will  change  its  value  of  Tt  as  the  result  of  a  particular  energy 
input.  This  is  called  the  A  Tt  mechanism  of  free-energy  transduction  because,  e.g.,  the 
lowering  of  Tt  can  cause  contraction  and  the  performance  of  useful  motion  [1]. 

Model  Proteins  as  Second-Order  Molecular  Machines  of  the  Tt  Type 

In  the  above  five  conversions  of  free  energy,  i.e.,  of  thermo-,  chemo-,  electro-, 
baro-,  and  photomechanical  transduction,  it  was  the  hydrophobic  folding  process 
that  directly  performed  the  useful  mechanical  work.  These  molecular  engines  were 
therefore  termed  first-order  molecular  machines  of  the  Tt  type,  where  Tt  type  in¬ 
dicated  the  inverse  temperature  transition  of  hydrophobic  folding  and  assembly  on 
raising  the  temperature  from  below  to  above  the  transition  temperature  range.  In 
terms  of  Figure  2,  these  are  the  five  pairwise  energy-conversion  arrows  that  have 
one  end  at  the  mechanical  force  apex. 

Postulate  III.  Now  it  is  also  possible  to  use  the  hydrophobic  folding  and  assembly 
process  to  perform  (catalyze)  energy  conversions  not  microscopically  involving 
the  intensive  variable  of  mechanical  force.  These  energy  conversions  involve  the 
other  10  pairwise  energy  conversions  of  Figure  2  not  involving  the  mechanical  force 
apex,  and  they  are  the  10  Corollaries  of  Postulate  III,  which,  itself,  states  that  “dif¬ 
ferent  energy  inputs,  each  of  which  can  individually  drive  hydrophobic  folding  to 
produce  motion  and  perform  mechanical  work,  can  be  converted  one  into  the  other 
(transduced)  by  means  of  the  inverse  temperature  transition  with  the  correctly 
designed  coupling  of  functional  moieties  and  Tt  value.”  Energy  conversions  of  this 
Tt  type  can  involve  the  proper  balancing  of  mean  hydrophobicity  of  a  potential 
hydrophobic  domain  with  a  resident  pair  of  different  functional  groups  in  their 
more  polar  states  such  that  Tt  is  just  above  the  desired  operating  temperature,  and 
it  can  involve  designs  for  enhanced  efficiency  for  energy  conversion.  The  design 
process  to  achieve  this  balancing  and  enhanced  efficiency  is  called  poising,  as  will 
be  discussed  below. 


Poising  and  Enhanced  Efficiency  of  Energy  Conversion 

To  poise  is  to  draw  up  into  readiness.  With  regard  to  the  design  of  second-order 
elastic  molecular  machines  of  the  Tt  type,  it  is  to  so  design  an  amphiphilic  polymer 
such  that  a  change  in  one  form  of  free  energy  can  trigger  a  change  in  a  second  form 
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of  free  energy.  For  example,  the  reduction  of  an  oxidized  prosthetic  group  involving 
the  free-energy  change,  ZFEo  A n\  noted  in  the  electrical  apex  of  Figure  2  can  cause  an 
increase  in  the  p  Ka  from  below  to  above  the  solution  pH  of  a  second  functional  moiety, 
e.g.,  a  COOH/COO  chemical  couple,  resulting  in  protonation  and  a  decrease  in  the 
chemical  potential  of  protons  in  solution.  Or,  conversely,  it  could  be  the  protonation 
of  a  carboxylate  that  would  change  the  redox  potential  of  a  prosthetic  group.  It  should 
be  emphasized  that  the  protonatable  and  reducible  functional  groups  are  separate 
chemical  entities  that  could  be  far  removed  in  terms  of  covalent  structure  but  which 
have  in  common  that  they  are  part  of  the  same  hydrophobic  folding  domain. 


Balancing  of  Hydrophobicity  and  a  Pair  of  Functional  Groups  in  Their  More 
Polar  States 

This  balancing  of  hydrophobicity  and  a  pair  of  functional  groups  in  their  more 
polar  states  can  be  depicted  in  terms  of  temperature  profiles  for  folding  and/or 
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aggregation  as  shown  in  Figure  3.  The  composition  of  the  elastic  protein-based 
polymer  is  designed  such  that,  with  a  Glu(E)  or  Asp(D)  residue  with  the  carboxylate 
side  chain  and  with  a  V-methyl  nicotinamide  attached  to  a  Lys(K)  side  chain  in 
its  oxidized  state,  the  hydrophobic  folding  transition  begins  just  above  the  operating 
temperature,  T0.  If  the  nicotinamide  is  reduced  as  in  Figure  3(A),  the  temperature 
for  the  hydrophobic  folding  transition  will  be  lowered  to  well  below  T0,  causing 
an  increase  in  the  p  Ka  of  the  Glu  or  Asp  residue  above  that  of  the  solution  pH 
such  that  folding  will  occur  with  the  protonation  of  the  carboxylate,  i.e.,  reduction 
results  in  a  ApATa  and  the  uptake  of  a  proton.  If  the  carboxylate  is  protonated  as 
in  Figure  3(B),  the  drive  for  hydrophobic  folding  causes  a  change  in  the  redox 
potential,  a  AE'0,  such  that  under  the  appropriate  condition  there  can  be  the  re¬ 
duction  of  the  nicotinamide,  i.e.,  protonation  can  result  in  the  uptake  of  an  electron. 
As  seen  Figure  3,  this  is  again  the  result  of  the  A  Tt  mechanism. 


Nonlinear  Hydrophobic-induced  pKa  Shifts 

As  the  model  protein  becomes  more  hydrophobic,  the  change  in  p Ka  resulting 
from  a  given  change  in  hydrophobicity,  Ahpb,  becomes  larger.  Thus,  in 
poly[/v(IPGVG),/x(IPGXG)],  where  X  can  be  Glu(E)  or  Asp(D),  the  change 
in  hydrophobicity  represented  by  replacement,  e.g.,  of  one  Glu  by  a  Val  residue 
per  30  residues,  has  a  much  greater  effect  on  the  p  Ka  of  the  Glu  residue  when  the 
change  is  from  2  Glu  residues  to  1  Glu  residue  per  30  residues  than  from  10  Glu 
residues  to  9  Glu  residues  per  30  residues  [24,25  ] .  Similarly,  for  a  constant  Glu  or 
Asp  residue  per  30mer,  the  replacement  of  five  Val  residues  by  Phe  residues  causes 
very  large  p  Ka  shifts  [26,27],  whereas  the  replacement  of  the  first  two  Val  by  two 
Phe  residues  per  30  residues  has  only  a  very  small  effect  on  the  p  Ka  of  the  Glu  or 
Asp  residue  (Urry  et  al.,  unpublished  data)  such  that  going  from  three  Phe  residues 
to  five  Phe  residues,  therefore,  has  a  very  large  effect  on  the  pKa  of  Glu  or  Asp. 
This  is  referred  to  as  a  nonlinear  hydrophobic-induced  p Ka  shift  as  schematically 


(a) 


(b) 


Temperature 


Temperature  - ► 


Figure  3.  Design  of  a  model  protein  containing  two  different  functional  groups  in  their 
more  polar  state  in  which  the  composition  has  been  adjusted  such  that  the  hydrophobic 
folding  transition  occurs  just  above  a  working  temperature,  T0.  The  energy  input  that 
converts  either  of  the  functional  groups  to  a  more  hydrophobic  state  can  drive  hydrophobic 
folding  by  lowering  the  value  of  Tt.  The  occurrence  of  the  second  functional  group  as  part 
of  the  hydrophobically  folded  state  changes  its  property,  which  could  result,  e.g.,  either  in 
the  uptake  of  a  proton  or  an  electron. 
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shown  in  Figure  4.  Now,  if  the  reduction  of  a  single  nicotinamide  moiety  per  30mer 
occurred  with  a  single  Glu  or  Asp  residue  per  30mer  and  there  were  no  Phe  residues 
in  the  30mer,  then  the  change  in  yKa  would  be  relatively  small,  as  depicted  in 
Figure  4.  If,  however,  there  were  two  or  three  Phe  residues  per  30mer  and  the 
nicotinamide  were  reduced,  the  change  in  p  Ka  of  the  Glu  or  Asp  residue  is  expected 
to  be  much  larger,  as  also  shown  schematically  in  Figure  4.  This  is  another  aspect 
of  poising  and  it  involves  a  more  efficient  conversion  of  electrical  (or  light)  energy 
into  chemical  energy. 

Perspectives  of  Hydrophobic  Domain  Development  and  Protein  Folding 

The  preceding  formalization  of  free-energy  transduction  by  means  of  inverse 
temperature  transitions  resulted  from  a  large  number  of  studies  on  elastic  protein- 
based  polymers.  Being  favorably  balanced  with  hydrophobic  valyl  side  chains  and 
more  polar  peptide  moieties  in  the  backbone  such  that  it  exhibited  its  hydrophobic 
folding  and  assembly  transition  between  room  temperature  and  body  temperature, 
the  parent  model  elastic  protein,  poly(VPGVG),  provided  a  fortunate  starting 


Figure  4.  A  schematic  representation  of  the  dependence  of  a  pA^fl  shift,  a  ApA 'a,  on  hy- 
drophobicity  of  the  model  protein  capable  of  exhibiting  a  hydrophobic  folding  and  assembly 
transition.  The  effect  of  a  particular  hydrophobicity  change,  <=>,  on  the  pA^,  depends  in  a 
nonlinear  way  on  the  hydrophobicity  of  the  initial  state.  This  is  referred  to  as  poising. 
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point  for  the  studies.  Being  a  repeating  pentameric  molecular  structure  with  one 
position  that  would  support  substitution,  it  became  possible  to  obtain  relevant  data 
on  substituents  of  significance  to  protein  structure  and  function.  The  next  several 
paragraphs  constitute  introduction  and  correlation  of  these  experimentally  derived 
perspectives  with  current  theoretical  and  experimental  studies  on  protein  folding 
by  consideration  of  the  ingenious,  cooperative,  “hydrophobic  zipper”  model  of  Dill 
[28] ,  a  related,  earlier  free  energy  of  nucleation  arising  from  hydrophobic  contacts 
facilitated  by  /3-turns  due  to  Matheson  and  Scheraga  [29]  and  the  insightful  ex¬ 
perimental  studies  of  Dobson  et  al.  [30,31],  which  combine  three  experimental 
approaches  to  suggest  a  more  generalized  hydrophobic  collapse  as  an  initial  step  in 
protein  folding. 

The  Theoretical  Hydrophobic  Zipper  Model  by  Dill 

The  theoretical  “hydrophobic  zipper”  model  by  Dill  [28]  is  a  binary  lattice  model 
in  which  there  are  two  types  of  units  in  a  polymer  chain:  hydrophobic  units  (/H) 
and  polar  units  (zP).  What  Dill  demonstrated  with  his  model  is  that  given  sequences 
can  result  in  cooperative  hydrophobic  folding.  With  the  correct  sequence,  the  as¬ 
sociation  of  a  particular  pair  of  hydrophobic  units  can  facilitate  the  association  of 
another  pair  to  the  growing  hydrophobic  cluster.  The  zipper  can  be  initiated  in  a 
segment  of  polymer  chain,  (/  —  «)•  •  •  i •  •  *(z  +  «),  when  a  polar  moiety  i  is 
flanked  on  both  sides  by  hydrophobic  sequences.  The  location  of  the  polar  i  moiety 
identifies  the  location  of  a  potential  turn  whereby  the  next-nearest  neighbors  ( i  - 
1 )  and  ( /  +  1 )  can  associate  hydrophobically.  Having  done  so,  however,  places  the 
next  pair  of  hydrophobic  residues,  (/  —  2)  and  (i  +  2),  in  proximity  such  that  they 
may  more  readily  associate  and  so  on  along  the  zipper  resulting  in  the  cooperative 
formation  of  a  hairpinlike  structure  held  together  by  a  string  of  hydrophobic  contacts. 

This  theoretical  hydrophobic  zipper  model  can  now  be  made  more  physical  by 
replacing  the  binary  code  with  a  relevant  hydrophobicity  scale  as  in  Figure  1  and 
Table  II.  A  real  /3-turn  structure  with  a  10-atom  hydrogen-bonded  ring  utilizing 
the  C — O  of  residue  i  and  the  NH  of  residue  i  +  3  can  be  considered  to  achieve 
the  folding  of  the  chains  as  in  a  hairpin  with  the  resulting  formation  of  a  cross-/3- 
structure  as  in  Figure  5.  This  is  closely  equivalent  to  the  Matheson  and  Scheraga 
analysis  [29],  with  the  addition  here  of  the  hydrophobicity  scale.  The  particular 
structure  of  Figure  5,  resulting  in  an  antiparallel  alignment  of  /3-chains,  is  such  that 
hydrophobic  association  between  chains  occurs  at  residues  R(  and  i?i+3  (as  occurred 
in  part  in  the  development  of  the  hydrophobicity  scale  of  Table  II),  and  then  at 
residues  i?,_2  and  Ri+5,  residues  and  Ri+7,  etc.  The  resulting  structure  will  be 
hydrophobic  on  one  side.  With  a  continued  folding  back  and  forth,  /3-barrels  could 
be  formed  with  hydrophobicity  inside  as  occurs  in  extramembrane  proteins  or  with 
the  hydrophobicity  outside  as  occurs  in  forming  pores  or  channels  in  lipid  bilayer 
membranes. 

The  hydrophobicity  scale  can  be  used  to  form  a  single-residue  hydrophobicity 
plot  for  a  given  primary  structure  and  followed  by  an  odd-and-even  sequence  editing 
to  identify  /3-chains  with  a  hydrophobic  sidedness  as  would  occur  in  the  hydro- 
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Hydrophobic  Folding  of  Cross  -  (3  -  Structures 
(Coe  -  Ca  virtual  bond  representation) 


Figure  5.  The  “hydrophobic  zipper”  model  of  Dill  put  in  real  structural  terms  of  a  p- 
turn  and  a  cross-/3-structure  where  the  hydrophobic  side  chain  pairing  occurs  with  residues 
i  and  i  +  3,  /  —  2,  and  i  +  5,  etc.  The  hydrophobicity  scale  of  Table  II  can  be  used  to 
compare  relative  hydrophobicities  by  summing  and  normalizing  each  possible  cross-/?- 
structure  that  could  form  from  a  given  primary  structure.  Those  structures  with  the  lowest 
normalized  value  of  Tt  would  be  the  most  favored. 


phobically  folded  /3-barrels  of  the  fibronectin  type  3  domains  [32,33].  Similar  se¬ 
quence  editing  could  be  used  to  identify  amphiphilic  a-helices. 

The  next  step  toward  a  more  correct  physical  model  is  to  introduce  the  interactions 
between  hydrophobic  (apolar)  and  polar  groups  that  are  responsible  for  triggering 
the  energy  conversions  listed  in  Table  I,  i.e.,  those  interactions  that  control  whether 
hydrophobic  folding  will  occur  at  a  particular  temperature.  From  the  studies  on 
the  elastic  protein-based  polymers  comes  the  perspective  that  a  sufficiently  polar 
group  can  exert  such  a  pull  on  the  surrounding  water  in  achieving  its  own  hydration 
that  it  can  destructure  sufficiently  proximal  waters  of  hydrophobic  hydration,  thereby 
removing  the  thermodynamic  driving  force  for  hydrophobic  folding  [1].  Alterna¬ 
tively,  in  this  competition  for  hydration,  it  should  be  appreciated  that  sufficiently 
hydrophobic  domains  can  raise  the  p  Ka  of  a  carboxyl  group,  for  example.  The 
important  point  here  is  that  the  cooperative  “hydrophobic  zipper”  model  of  Dill 
provides  the  foundation  whereby  it  becomes  possible  to  think  of  more  physical 
treatment  of  cooperative  hydrophobic  folding  in  proteins.  It  might  also  be  appre¬ 
ciated  that  the  cooperative  hydrophobic  folding  to  form  the  /3-spiral  structure  of 
polyL/v(VPGVG)  Jx(VPGXG)],  as  has  been  discussed  for  many  years  (see  [1] 
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and  references  therein )  in  the  terminology  of  Dill,  is  an  example  of  a  helical  hy¬ 
drophobic  zipper. 

The  Experimental  Hen  Lysozyme  Studies  of  Dobson 

An  emerging  general  perspective  from  the  experimental  side  of  protein  folding 
is  seen  in  the  elegant  work  of  Dobson  and  co-workers  on  hen  lysozyme  [30,31]. 
Its  elegance  arises  from  the  temporal  comparison  of  stopped-flow  far-uv  circular 
dichroism  studies,  pulsed  hydrogen  exchange  labeling,  and  fluorescence  intensity 
studies.  From  the  stopped-flow  far-uv  circular  dichroism  studies  on  hen  lysozyme, 
the  circular  dichroism  pattern  of  the  native  protein  was  obtained  after  4  ms,  indi¬ 
cating  that  the  average  relative  orientation  of  peptide  backbone  moieties  was  con¬ 
sistent  with  the  native  a-helix  and  /3-structure.  The  pulsed  hydrogen-exchange  studies 
indicated  essentially  no  slowing  of  peptide  NH  exchange  as  would  be  expected  of 
stable  a-helix  and  /3-structure,  suggesting  that  the  structure  was  being  held  together 
by  tertiary  structure  interactions.  The  fluorescence  intensity  studies  using  native 
tryptophan  residues  argued  that  hydrophobic  folding  occurred  within  the  first  few 
milliseconds  of  the  refolding  process.  This  caused  Dobson  and  co-workers  [30]  to 
suggest  that  the  fundamental  question  is  “the  degree  to  which  secondary  structure 
formation  in  the  first  steps  of  folding  is  accompanied  by  global  structural  collapse, 
for  example,  by  hydrophobic  side  chain  interactions,  or  indeed  whether  secondary 
structure  is  formed  as  a  consequence  of  such  a  collapse”  with  reference  to 
Dill  [28]. 

Realizing  that  circular  dichroism  does  not  report  directly  on  the  hydrogen  bonding 
of  secondary  structure  but  rather  on  the  relative  orientation  of  peptide  moieties 
that  are  associated  with  the  a-helix  and  ^-structures,  the  studies  suggest  that  hy¬ 
drophobic  folding  is  an  early  event  that  can  then  stabilize  secondary  structure.  As 
suggested  by  Dill  [  28  ] ,  this  provides  an  answer  to  the  Levinthal  paradox  that  there 
is  not  sufficient  time  for  a  protein  chain  to  sample  all  of  the  possible  torsional  angle 
conformations  in  the  process  of  finding  the  global  minimum.  It  suggests  that  forming 
energetically  favorable  hydrophobic  domains  can  be  a  dominant  protein  folding 
process,  and  as  we  have  reported  above,  controlling  hydrophobic  domain  formation 
can  be  the  means  with  which  to  achieve  all  of  the  energy  conversions  of  which 
living  organisms  are  capable. 
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Abstract 

A  new  class  of  calcium  antagonists  (dibenzotricyclic  compounds)  is  studied  by  means  of  reaction  field 
ab  initio  calculations  and  molecular  dynamics  simulations.  The  central  ring  of  these  tricyclic  molecules 
is  found  to  be  more  important  to  the  calcium  antagonistic  potency  than  the  two  phenyl  rings.  The  central 
ring  with  antagonistic  potency  shows  hydrophobic  character,  thus  the  interaction  between  the  drug  and 
the  binding  sites  is  assumed  to  be  dominated  by  hydrophobic  interactions.  Variation  of  the  flexure  angle, 
the  angle  between  the  two  phenyl  rings,  does  not  change  the  hydrophobic  property  of  the  central  ring 
significantly,  therefore  it  is  not  expected  to  affect  the  interaction  between  the  drug  and  binding  site 
directly.  The  effect  of  the  flexure  angle  on  calcium  antagonistic  potency,  the  relation  between  drug  affinity 
of  these  tricyclic  molecules  and  their  ionization  energies,  and  the  interaction  of  calcium  ions  with  the 
central  ring  are  discussed.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Calcium  ions  (Ca2+)  play  a  vital  role  in  many  biological  processes  including  a 
variety  of  enzymatic  reactions,  activation  of  excitable  cells,  coupling  of  electrical 
activation,  and  cellular  secretion,  homeostasis.  The  regulation  of  the  intracellular 
concentration  of  this  ion  makes  it  possible  to  control  Ca2+ -dependent  processes. 
According  to  the  World  Health  Organization  [1],  calcium  antagonists  are  a  chem¬ 
ically  heterogeneous  group  of  compounds  that  modify  Ca2+  mobilization  or  Ca2+ 
action.  Verapamil  [2],  diltiazem  [3  ] ,  and  nifedipine  [4]  are  among  representatives 
of  the  first  generation  of  calcium  antagonists,  currently  used  for  the  treatment  of 
angina  and  hypertension.  Calcium  antagonists  can  affect  cardiovascular  hemody¬ 
namics  by  three  principal  actions:  coronary  arterial  dilation,  peripheral  arterial 
dilation,  and  a  negative  inotropic  effect.  For  example,  nifedipine  dilates  the  renal 
arteries  by  abolishing  autoregulation  and  it  also  dilates  the  cerebral  arteries.  This 
widespread  vasodilation  reduces  systemic  vascular  resistance  both  in  animals  and 
in  humans,  increasing  both  contractility  and  heart  rate  [  5  ] .  It  has  also  been  reported 
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that  diltiazem  may  cause  serious  adverse  effects  such  as  cutaneous  vasculitis  [6], 
thrombocytopenia  [7],  heart  block  [8],  parkinsonism  [9],  and  fatal  renal  and 
hepatic  toxicity  [  10  ] . 

Attempts  to  find  tissue  selective  calcium  antagonists  have  been  made  for  more 
than  a  decade.  Recently,  a  series  of  compounds  with  dibenzotricyclic  structures 
(dbts)  was  reported  as  a  new  class  of  calcium  antagonists  having  selectivity  for 
cardiac  tissue  over  vascular  tissue  [11].  These  dbts,  for  example,  I— III  in  Figure 
1,  may  confer  antagonistic  activity  without  a  side  effect  on  blood  pressure.  The 
structure-activity  studies  show  that  calcium  antagonistic  activity  is  closely  related 
to  the  flexure  angle,  the  angle  between  the  planes  of  the  two  phenyl  rings  in  DBTs. 
Most  recently,  a  series  of  compounds  such  as  dibenzothiazepinone  (IV-V  in  Figure 
1 )  has  been  synthesized  [  12] .  These  compounds  are  expected  to  have  smaller  flexure 
angles  due  to  a  planar  amide  bond  in  the  central  ring  of  the  tricyclic  system.  Thus 
enhanced  calcium  antagonistic  activities  of  these  compounds  are  anticipated. 

Due  to  increasing  interest  in  calcium  ion  channels  and  in  the  interaction  between 
calcium  ion-channel  related  binding  sites  (receptors)  and  drugs  (ligands),  we  carried 
out  computational  chemical  studies  of  these  dibenzotricyclic  molecules.  The  purpose 


I  (X=CH20) 

II  (X=CH2S) 

III  (X=  CH=CH) 


Figure  1 .  The  structures  of  some  new  dibenzotricyclic  calcium  antagonists. 
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of  this  article  is  to  investigate  the  conformations  and  electrostatic  energy  distributions 
of  some  model  molecules  related  to  calcium  antagonistic  molecules.  Five  model 
molecules  (1-5  in  Figure  2)  are  chosen  to  model  the  respective  candidate  drug 
molecules  shown  in  Figure  1.  Ab  initio  calculation,  which  includes  a  reaction  field 
to  take  into  account  the  solvent  effect,  was  employed  in  this  study.  Particular  at¬ 
tention  was  paid  to  the  bridge  region  of  these  tricyclic  compounds  and  to  their 
flexure  angles.  In  order  to  understand  the  interaction  between  dibenzotricyclic  mol¬ 
ecules  and  water  and  between  the  calcium  ions  and  these  molecules,  molecular 
dynamics  (md)  simulations  have  been  carried  out  for  some  model  tricyclic  molecules 
in  solution.  The  article  is  organized  as  follows.  In  the  second  section  we  briefly 
describe  the  computational  details  of  ab  initio  calculations  and  molecular  dynamics 
simulations.  The  optimized  geometries,  atomic  charges,  electrostatic  energies,  and 
flexure  angles  of  these  model  tricyclic  molecules  are  compared  in  detail  in  the  third 


Figure  2.  The  structures  of  model  molecules  used  in  the  present  study. 
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section.  The  results  of  the  interactions  of  these  model  molecules  with  water  and 
Ca2+  ions  are  discussed  in  the  fourth  and  fifth  sections,  respectively. 

Computational  Details 

The  conformations,  atomic  charges,  and  electrostatic  potential  were  calculated 
by  ab  initio  (Hartree-Fock,  hf)  methods  using  the  Gaussian  92  program  [13  ] .  The 
geometries  were  optimized  at  the  HF/STO-3G  level  using  gradient  methods.  The 
self-consistent  reaction  field  (scrf)  method  was  employed  in  the  calculation  to 
allow  for  the  solvent  effect  on  the  electronic  features  of  these  molecules  and  to 
model  the  “true”  molecular  environment  during  geometry  optimization.  In  this 
approach,  the  solute  is  placed  in  a  spherical  cavity  immersed  in  a  continuous  me¬ 
dium  with  a  dielectric  constant  e.  A  molecular  dipole  will  induce  a  dipole  in  the 
medium  and  in  return,  the  electric  field  applied  to  the  solute  by  the  solvent  reaction 
dipole  will  lead  to  a  change  of  the  solute  dipole;  this  interaction  is  repeated  until 
the  system  reaches  a  net  stabilization.  A  dielectric  constant  of  78.25  is  used  for  the 
water  solvent,  and  the  volume  of  the  spherical  cavity  of  model  molecules  is  defined 
as  the  volume  inside  a  contour  of  0.001  electron /bohr 3  density.  The  atomic  charges, 
which  are  fitted  to  the  electrostatic  potentials  at  points,  selected  according  to  the 
Merz-Singh-Kollman  scheme  [14],  were  calculated  at  the  HF/6-31G(d,p)  level 
using  the  hf /  STO-3G  optimized  geometries.  The  electrostatic  potentials  at  nuclei 
are  calculated  at  the  same  level  as  well. 

Molecular  dynamics  simulations  were  carried  out  for  aqueous  solutions  of  mol¬ 
ecules  1  through  5 .  The  simulations  were  performed  without  and  with  added  CaCl2 
salt.  All  simulations  were  started  by  placing  two  drug  molecules  in  the  simulation 
cell  together  with  254  water  molecules.  In  simulations  with  added  salt,  six  randomly 
chosen  water  molecules  were  replaced  with  two  Ca2+  and  four  Cl“  ions.  The  systems 
are  equilibrated  during  50  ps,  followed  by  a  production  run  of  70  ps.  The  atom- 
atom  radial  distribution  functions  (rdf)  have  been  used  to  examine  the  structure 
of  simulated  aqueous  solutions.  The  rdf  g(ry)  describes  the  spatial  distribution  of 
atom  j  around  atom  i  and  gives  the  probability  of  finding  a  pair  of  atoms  ij  at  the 
distance  r  apart,  relative  to  the  probability  expected  for  a  completely  random  dis¬ 
tribution  at  the  same  density,  i.e.,  bulk  density,  p0  [15] .  The  integral  of  ginj): 

z(  r,j)  =  4irpo  f  g(rtJ)r2  dr 
Jo 

gives  the  number  of  atoms  j  around  atom  i  as  a  function  of  the  distance  r  and  is 
called  the  coordination  number  (integral)  of  the  rdf. 

The  drug  molecules  are  kept  rigid  throughout  the  simulation.  This  is,  of  course, 
only  an  approximation  but  it  can  be  justified  by  the  stiffness  of  the  tricyclic  com¬ 
pounds  treated  in  the  present  study.  It  is  important  to  maintain  the  same  molecular 
geometry  in  the  simulations  as  the  one  used  to  calculate  the  atomic  charges  and 
electrostatic  potentials.  Another  reason  to  maintain  a  rigid  geometry  is  to  study  the 
effects  of  varying  the  flexure  angle  of  the  drug  compounds  on  intermolecular  (be¬ 
tween  drug  and  water  or  between  drug  and  Ca2+)  interactions.  In  a  forthcoming 
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article,  we  will  report  MD  simulations  for  compounds  1  through  5  allowing  full 
flexibility  and  we  will  investigate  the  effect  of  attaching  a  hydrocarbon  chain. 

Standard  Lennard-Jones  atom-atom  potentials  were  used  for  the  drug  molecules 
and  the  ions.  Atomic  charges  are  calculated  in  the  present  work  and  discussed  in 
the  next  section.  Rigid  simple  point  charge  (SPC)  water  [16]  molecules  were  used 
in  all  simulations.  Standard  combination  rules  were  employed  to  construct  the 
cross-interaction  potentials  between  unlike  atoms.  The  potential  parameters  are 
given  in  Table  I.  The  simulations  are  carried  out  in  an  NVT  ensemble  at  300  K, 
using  normal  densities.  The  simulation  software  is  a  modified  version  of  the 
McMoldyn  package  [17]. 

Ab  Initio  Results  and  Discussions 

Analysis  of  Atomic  Charges  and  Electrostatic  Potentials 

The  computed  atomic  charges  and  electrostatic  potentials  at  heavy  atoms  for 
molecules  1,  2,  3,  4a,  4b,  and  5  are  tabulated  in  Tables  II  and  III,  where  4a 
corresponds  to  a  HF/STO-3G  optimized  structure,  while  4b  corresponds  to  an  en¬ 
larged  flexure  angle  ( 124.8°  and  169.9°  for  4a  and  4b,  respectively).  Replacement 
of  the  oxygen  atom  of  1  by  a  sulphur  atom  results  in  noticeable  changes  of  the 
charge  density  distribution  in  the  bridge  region.  For  example.  Cl  and  C2  in  2  have 
almost  doubled  the  charges  they  have  in  1 ,  and  the  charge  at  C3  decreases  from 
0.003  in  1  to  —0. 124  in  2.  Also  the  C4  and  C6  charges  are  decreased  by  more  than 
a  factor  of  two  from  1  to  2 .  Comparing  the  charge  distribution  of  3  with  1 ,  we 
observe  that,  like  from  1  to  2,  the  charges  at  C2  and  C5  are  increased  while  they 
are  decreased  at  C4  and  C6  from  1  to  3 .  In  contrast  to  the  charge  changes  in  the 
phenyl  rings,  the  bridge  region  atoms  have  in  general  larger  change  than  the  phenyl 
rings  from  1  to  2  or  from  1  to  3 .  The  fact  that  2  and  3  represent  molecules  having 
greater  antagonistic  potencies  than  1  may  indicate  a  favored  interaction  of  drug 
and  binding  sites  occurring  at  the  bridge  region  of  these  tricyclic  molecules,  rather 
than  at  the  phenyl  rings  as  suggested  by  Kurokawa  et  al.  [11]. 

Table  III  lists  the  electrostatic  potential  at  corresponding  nuclei  calculated  at  the 
HF/6-31G(d,p)  level  with  HF/STO-3G  optimized  geometries.  Relative  to  1 , 2  and 


Table  I.  Lennard-Jones  interaction 
parameters  used  in  the  MD  simulation. 


€  (kcal/mol) 

a  (A) 

c 

0.113 

3.350 

N 

0.082 

3.310 

H 

0.113 

3.350 

O 

0.136 

2.950 

S 

0.404 

3.520 

Ca2+ 

1.206 

4.500 

Cl” 

0.118 

2.209 
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Table  II.  Atomic  charges  at  hf/6-31G**//hf/sto-3G  level.a,b 


No.  of  atom  Model  molecules 


1 

2 

3 

4a 

4b 

5 

1 

C  -.369 

C  — .786( — ) 

C  -  344(+) 

O  -  325(+) 

O  — .422(— ) 

S  — ,097(+) 

2 

C  .175 

C  .439(+) 

C  .197(+) 

C  .261(+) 

C  .481(+) 

C  -  038(— ) 

3 

C  .003 

C  — .124(— ) 

C  .102(+) 

C  -  099(— ) 

C  — .2 1 7( — ) 

C  .073(+) 

4 

C  .089 

C  .032(— ) 

C  — .220(-) 

C  .701(+) 

C  .866(+) 

C  .666(+) 

5 

O  -.404 

S  -  280(+) 

C-.198(+) 

N  — .657(— ) 

N  — .838( — ) 

N  — ,772(— ) 

6 

C  .227 

C  .007(— ) 

C  .1 12( — ) 

C  .257(+) 

C  ,418(+) 

C  .421(+) 

7 

C  .152 

C  .391(+) 

C  .181(+) 

C  .168(+) 

C  .197(+) 

C  — .088( — ) 

8 

C  -.245 

C  -  309(— ) 

C  — .254(— ) 

C-.164(+) 

C  ~.224(+) 

C  .014(+) 

9 

C  — .171 

C-.I37(+) 

C-.141(+) 

C  — .202(— ) 

C  — .  1 99( — ) 

C  — .285( — ) 

10 

C  -.157 

C-.I53(+) 

C  — .  16 1( — ) 

C-.123(+) 

C-.123(+) 

C  — .078(+) 

11 

C  -.208 

C-.I88(+) 

C  — .2 1 2( — ) 

C  — .26 1( — ) 

C  — .339(— ) 

C  — .3 1 9( — ) 

12 

C  -.224 

C  — .322(— ) 

C  — .255(— ) 

C-.219(+) 

C  — .368( — ) 

C  — .002(+) 

13 

C  -.161 

C  —.1 17(+) 

C— .  144(+) 

C-.104(+) 

C  -  026(+) 

C  — ,208(— ) 

14 

C  -.136 

C  — .157(— ) 

C  — .155(— ) 

C  — .  1 56( — ) 

C  -  205(— ) 

C  — .074(+) 

15 

C  -.195 

C-.I93(+) 

C  — .205( — ) 

C-.143(+) 

C  — .150(+) 

C  — .229(-) 

16 

O  -.569 

O  -.619 

O  -.556 

a  Atomic  charges  were  calculated  by  fitting  to  the  electrostatic  potentials  at  points  selected  according 
to  Merz-Singh-Kollman  scheme  [14]. 

b  The  “+”  sign  in  the  parentheses  indicates  greater  atomic  charges  with  respect  to  that  of  atoms  at 
respective  positions  of  1,  while  the  sign  corresponds  to  a  smaller  value  relative  to  that  of  1. 


3  have  the  same  qualitative  change  of  the  electrostatic  potential  at  the  C2,  C4,  C6, 
C9,  and  CIO  positions.  The  same  conclusion  applies  to  the  atomic  charges  except 
the  trend  for  CIO.  This  implies  that  the  change  of  charges  and  electrostatic  potentials 
at  these  positions  can  be  used  as  one  of  the  indices  to  discuss  the  antagonistic 
activity  of  some  new  candidate  molecules  that  have  similar  structures,  such 
as  1-3. 

The  newly  synthesized  compounds  dibenzoxazepinone  and  dibenzothiazepinone 
[12]  are  modelled  by  4a  and  5,  respectively.  The  atomic  charges  and  electrostatic 
potentials  at  the  heavy  atoms  of  these  two  model  molecules  are  listed  in  Tables  II 
and  III.  Relative  to  1 ,  4a  and  5  have  in  general  opposite  changes  of  charges  and 
electrostatic  potentials  at  the  positions  of  C2,  C4,  C6,  and  C9  as  2  and  3  do.  For 
instance,  4a  and  5  correspond  to  increased  charges  and  electrostatic  potentials  at 
C4  and  C6  positions  with  respect  to  that  of  1 ,  whereas  2  and  3  have  relatively 
decreased  values  at  the  same  positions  in  comparison  with  1 .  There  are  increased 
charges  and  decreased  electrostatic  potentials  at  C9  of  5  and  6  relative  to  1 ,  re¬ 
spectively,  but  the  converse  holds  at  the  C9  position  of  2  and  3 .  These  opposite 
changes  in  charges  and  electrostatic  potentials  indicate  that  4a  and  5  will  not  have 
as  great  Ca2+  antagonistic  potencies  as  2  and  3,  if  4a  and  5  are  assumed  to  interact 
with  the  same  binding  sites  as  2  and  3 .  In  our  previous  study  [  1 8  ] ,  we  found  that 
3  corresponds  to  a  positive  electrostatic  potential  near  the  bridge  region,  whereas 
4a  and  5  correspond  to  a  negative  electrostatic  potential  in  the  same  region,  and 


Table  III.  hf/6-31G**//hf/sto-3G  electrostatic  potential  at  nuclei  (a.u.). 
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therefore,  if  molecules  4a  and  5  do  exhibit  calcium  antagonistic  potency  experi¬ 
mentally,  they  probably  interact  with  different  binding  sites  or  with  different  states 
of  the  binding  sites  of  2  or  3 . 

Flexure  Angle  Effects 

In  a  recent  structure-activity  study  [  1 1  ]  it  was  found  that  the  flexure  angles  between 
the  two  phenyl  rings  in  these  tricyclic  molecules  are  important  to  the  Ca2+  antag¬ 
onistic  activity.  The  hf/sto-3G  SCRF  optimizations  yield  flexure  angles  of  120.6, 
122.5,  114.5,  124.8,  and  112.6°  for  molecules  1,  2,  3,  4a,  and  5,  respectively. 
Model  molecule  2  has  a  larger  angle  than  1  but  3  has  a  smaller  angle  than  1 .  Since 
2  and  3  represent  molecules  having  more  potent  antagonistic  activities  than  1 ,  the 
flexure  angle  does  not  directly  affect  the  chemical  interaction  between  the  drug  and 
binding  site  of  the  receptor.  The  effect  of  the  flexure  angle  may  be  due  to  a  specific 
structural  requirement  of  receptors  to  ligands  or  to  the  interaction  of  drugs  with 
other  molecules  while  approaching  the  binding  sites.  We  will  examine  the  flexure 
angle  effect  on  the  interaction  between  drug  and  water  in  detail  in  the  third  section. 

The  Affinities  of  Drug  Molecules 

The  interactions  between  calcium  antagonists  and  their  binding  sites  are  chemical 
interactions  [19].  One  of  the  important  indices  to  measure  the  activity  of  a  drug 
is  its  affinity.  The  affinity  of  a  drug  molecule  is  closely  related  to  its  ionization 
energy  or  its  electron  affinity.  Table  IV  summarizes  the  values  calculated  at  the 
HF/6-31G(d,p)  and  HF/STO-3G  levels  with  HF/STO-3G  optimized  geometries  for 
the  model  molecules  1,  2,  3,  4a,  and  5,  respectively.  The  ionization  energies  and 
electron  affinities  were  estimated  by  means  of  Koopmans’  theorem  [20],  which, 
of  course,  is  an  approximation,  but  nonetheless  useful  for  qualitative  discussions. 

The  HF/6-31G(d,p)  and  HF/STO-3G  calculations  yield  similar  orders  of  ioniza¬ 
tion  energies  and  electron  affinities.  Model  molecules  2  and  3,  which  have  greater 
antagonistic  potencies,  correspond  to  smaller  ionization  energies  than  1 ,  while  the 
electron  affinities  of  both  2  and  3  are  less  than  that  of  I .  In  view  of  the  positive 


Table  IV.  Approximate  ionization  energies  and  electron  affinities  at 
the  hf/sto-3G  and  HF/6-31G(d,p)  levels  (a.u.).a 


HF/STO-3G 

HF/6-3IG(d,p) 

I 

A 

I 

A 

1 

0.26059 

0.25764 

0.31337 

0.13360 

2 

0.23363 

0.21752 

0.28927 

0.10623 

3 

0.21681 

0.25676 

0.29018 

0.13403 

4a 

0.24763 

0.21214 

0.32143 

0.09032 

5 

0.24217 

0.21614 

0.32496 

0.08688 

a  All  calculations  used  HF/STO-3G  optimized  geometries. 
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electrostatic  potential  near  the  bridge  region  [18],  we  suggest  that  these  tricyclic 
molecules  interact  with  the  binding  sites  by  donating  electrons  into  the  sites.  Thus, 
the  ionization  energy,  which  measures  the  energy  required  to  remove  an  electron 
from  a  molecule,  can  be  used  to  measure  the  affinity  of  a  drug  molecule.  Therefore, 
molecules  2  and  3  have  a  greater  drug  affinity  than  1  due  to  their  smaller  ionization 
energies.  Even  though  the  ionization  energies  of  4a  and  5  are  smaller  than  that  of 
1 ,  they  are  still  greater  than  the  corresponding  values  of  2  and  3 .  Consequently, 
we  predict  that  4a  and  5  will  have  less  antagonistic  potencies  than  2  and  3.  Of 
course,  this  prediction  is  under  the  assumption  that  they  interact  with  the  same 
binding  sites  as  2  and  3 ,  as  we  already  pointed  out  in  the  previous  sections. 

The  Interaction  of  Drug  Molecules  with  Water 

Since  water  is  the  natural  environment  in  a  biological  system,  hydration  and 
other  specific  interactions  between  drugs  and  water  are  of  importance  in  determining 
the  action  of  drugs  in  the  human  body.  Furthermore  this  information  can  provide 
insight  into  the  interaction  between  the  drug  molecules  and  their  binding  sites. 
Molecular  dynamics  simulation  results  are  reported,  focusing  on  the  interaction 
between  water  and  the  bridge  region  of  these  model  molecules.  The  structure  of 
water  around  the  central  ring  was  examined  using  radial  distribution  functions: 
g(X  —  Oh),  where  X  refers  to  the  atoms  of  the  central  ring,  and  Ow  to  water  oxygen. 
The  coordination  numbers  (integrals)  of  the  RDF  were  also  included  in  the  figures 
with  the  same  line  pattern  corresponding  to  the  rdf. 

Figure  3  displays  radial  distribution  functions  of  the  water  oxygen  around  the 
Cl,  C4,  and  05  atoms  of  the  central  ring  of  1.  The  shape  of  g(Cl-Ow)  clearly 
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Figure  3.  The  radial  distribution  functions  and  corresponding  coordination  numbers  of 
water  oxygen  with  bridge  atoms  of  1 . 
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Figure  4.  The  radial  distribution  functions  and  corresponding  coordination  numbers  of 
water  oxygen  with  bridge  atoms  of  2. 

indicates  that  the  interaction  between  water  and  Cl  is  hydrophobic.  The  same 
observation  holds  for  the  interaction  between  water  and  C4.  On  the  other  hand, 
the  peak  at  2.8  A  in  g{05-Ow)  exhibits  a  weak  hydrogen  bond  between  05  and 
water  (the  dashed  line),  its  corresponding  coordination  number  is  0.66.  Replacement 
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Figure  5.  The  radial  distribution  functions  and  corresponding  coordination  numbers  of 
water  oxygen  with  bridge  atoms  of  3. 
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of  the  oxygen  in  the  central  ring  of  1  by  sulphur  gives  rise  to  a  hydrophobic  central 
ring,  as  illustrated  in  Figure  4.  The  same  conclusion  holds  for  3 .  Figure  5  illustrates 
the  corresponding  radial  distribution  functions  for  3 .  Overall,  the  curve  shapes  of 
Figure  5  are  closer  to  Figure  4  than  to  Figure  3.  This  indicates  that  the  hydrophobic 
bridge  region  in  these  tricyclic  compounds  will  enhance  the  antagonistic  activities 
of  the  drug.  It  also  suggests  that  the  interaction  between  these  tricyclic  molecules 
and  their  binding  sites  is  dominated  by  hydrophobic  interaction,  as  suggested  by 
Triggle  et  al.  in  the  case  of  dihydropyridine  [19]. 

We  have  also  carried  out  md  simulations  on  model  molecules  4a,  4b,  and  5. 
Figure  6  is  the  related  radial  distribution  functions  for  4a  and  4b,  respectively.  The 
small  peak  at  2.8  A  of  £(01-0W)  (dotted  line  in  Figure  6)  indicates  that  there  is 
weak  hydrogen  bonding  between  Ol  of  4a  and  water  due  to  the  electronegative 
oxygen  in  the  bridge  region.  Although  the  nitrogen  in  the  bridge  region  does  not 
appear  to  have  a  strong  interaction  with  water  (cf.  solid  line  in  Figure  6),  the  016 
of  4a  has  a  strong  interaction  with  water  through  hydrogen  bonding.  With  reference 
to  the  previous  discussion  in  this  section  about  binding  sites  favoring  a  hydrophobic 
bridge  region  in  tricyclic  molecules,  4a  is  unlikely  to  show  stronger  antagonistic 
potency  than  2  or  3  to  the  same  binding  sites.  The  same  conclusion  applies  to 
model  molecule  5  as  well. 

It  is  interesting  to  see  how  the  flexure  angle  between  the  two  phenyl  rings  affects 
the  interaction  between  water  and  model  drug  molecules.  Sterically,  a  smaller  flexure 
angle  will  more  or  less  block  water  from  entering  the  bridge  region,  thus  a  weaker 
interaction  is  expected.  Figure  7  shows  the  radial  distribution  functions  of  water 
oxygen  with  the  Ol,  N4,  and  016  atoms  of  4b.  The  only  difference  between  4a 
and  4b  is  the  flexure  angle:  124.8°  vs.  169.9°.  Surprisingly,  we  did  not  observe  an 
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Figure  6.  The  radial  distribution  functions  and  corresponding  coordination  numbers  of 
water  oxygen  with  bridge  atoms  of  4a . 
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Figure  7.  The  radial  distribution  functions  and  corresponding  coordination  numbers  of 
water  oxygen  with  bridge  atoms  of  4b. 


enhanced  interaction  between  water  and  0 1  or  between  water  and  N4  atoms  in 
4b.  Instead,  the  hydrogen  bond  between  water  and  Ol  of  4a  is  slightly  weaker  in 
4b  due  to  the  enlarged  flexure  angle  (Figure  7  vs.  Figure  6).  Also,  nitrogen  and 
water  interactions  become  weaker  as  indicated  by  the  reduced  intensity  of  the  radial 
distribution  function  and  the  closest  distance  in  the  rdf  (cf.  Fig.  7).  The  interaction 
between  water  and  0 1 6  is  slightly  reduced  as  well  after  the  flexure  angle  is  enlarged. 
The  coordination  numbers  corresponding  to  ^(016-0^)  (dashed  lines  in  Figures 
6  and  7 )  are  1.14  and  1 .06  for  4a  and  4b ,  respectively.  By  comparing  Figures  6 
and  7,  we  note  that  in  general  an  enlarged  flexure  angle  does  not  change  significantly 
the  hydrophilicity  of  the  bridge  region  in  the  model  drug  molecules,  and  therefore 
it  does  not  affect  the  interaction  between  the  drug  molecules  and  their  binding  sites 
significantly.  This  suggests  that  the  effect  of  the  flexure  angle  on  Ca2+  antagonistic 
activity  is  mainly  due  to  steric  blocking  of  drug  molecules  entering  the  bind¬ 
ing  sites. 


The  Interaction  between  Calcium  Ions  and  Drug  Molecules 

Of  particular  significance  to  the  calcium  antagonist  study  is  whether  or  not  the 
antagonists  interact  with  calcium  ions.  So  far  there  is  no  clear  experimental  evidence, 
although  there  have  been  some  reports  of  the  Ca2+  dependence  of  the  drugs’  binding 
to  the  calcium  channels  [  2 1 ,22  ] .  The  allosteric  interaction  between  different  classes 
of  calcium  channel  antagonists  at  the  dihydropyridine  receptor  also  seems  to  be 
Ca2+ -dependent  [21].  Recently,  Ananthanarayanan  et  al.  [23]  studied  the  inter¬ 
action  of  calcium  channel  antagonists  (diltiazem)  with  calcium  by  means  of  NMR 


DIBENZOTRICYCLIC  CALCIUM  ANTAGONISTS 


29 


r(A) 

Figure  8.  The  radial  distribution  functions  of  calcium  ion  with  bridge  atoms  of  1 . 


spectroscopy  and  molecular  modeling.  We  present  here  our  preliminary  results 
based  on  molecular  dynamics  simulations. 

Similar  to  the  previous  section,  rdfs  are  used  to  examine  the  probability  of 
finding  Ca2+  around  the  model  drug  molecule.  Figures  8  and  9  show  the  radial 
distribution  functions  of  Ca2+  ions  with  the  bridge  region  atoms  for  model  molecules 


r  (A) 

Figure  9.  The  radial  distribution  functions  of  calcium  ion  with  bridge  atoms  of  2. 
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1  and  2,  respectively.  We  should  point  out  that  although  the  curves  in  these  figures 
are  not  smooth  enough  due  to  the  relatively  short  running  time  (140  ps),  some 
information  still  can  be  extracted  from  them. 

The  shapes  of  g-(Cl-Ca2+),  £(C4-Ca2+),  and  #(05-Ca2+)  indicate  that  there 
is  no  direct  interaction  between  Ca2+  and  model  molecule  1 .  The  “contact”  distances 
in  the  rdfs  are  approximately  9.4,  7.1,  and  6.9  A,  respectively.  Peaks  near  8.3  A 
can  be  expected  in  the  long  run  for  the  Cl-Ca2+  and  C4-Ca2+  pairs.  A  peak  that 
appears  near  9.8  corresponds  to  the  05-Ca2+  pair.  This  is  attributed  to  Ca2+  in¬ 
teracting  with  the  drug  central  ring  through  water,  i.e.,  a  strongly  hydrated  Ca2+ 
surrounded  by  water  interacts  with  the  hydrated  drug.  In  addition,  the  sharp  increase 
of  rdfs  from  the  starting  distances  in  Figure  8  manifests  the  Ca2+  unable  to  come 
closer  to  the  bridge  ring  of  the  drug  molecule  because  of  the  positive  electrostatic 
potential  near  the  region.  So  the  bridge  region  is  “hard”  to  Ca2+  ions.  However, 
replacement  of  oxygen  in  the  central  ring  of  1  by  sulfur  causes  molecule  2  to  become 
“soft”  to  the  Ca2+  ions.  The  contact  distances  of  rdfs  in  the  model  molecule  2 
(Fig.  9)  are  shorter  than  the  corresponding  distances  in  model  molecule  1 .  Similar 
observations  were  found  in  the  case  of  3 .  On  the  basis  of  these  preliminary  results, 
we  propose  that  there  is  no  direct  interaction  between  Ca2+  ions  and  the  central 
ring  of  these  model  drug  molecules. 

Conclusions 

In  this  work  a  new  class  of  calcium  antagonists  (dibenzotricyclic  compounds) 
has  been  studied  by  means  of  reaction  field  ab  initio  calculations  and  molecular 
dynamics  simulations  within  the  rigid  model  approach.  Relatively  greater  changes 
in  atomic  charges  and  electrostatic  potentials  are  found  in  the  central  ring  than  the 
two  phenyl  rings  of  the  model  tricyclic  drug  molecules,  hence  the  central  ring  of 
these  molecules  may  be  more  important  to  the  calcium  antagonistic  potency  than 
the  two  phenyl  rings.  The  molecular  dynamics  study  of  the  interaction  between 
the  central  ring  of  the  drug  molecule  and  water  indicates  that  a  central  ring  with 
great  antagonistic  potency  has  mainly  hydrophobic  character,  and  therefore,  the 
interaction  between  the  drug  and  binding  sites  may  be  dominated  by  hydrophobic 
interactions.  Variation  of  the  flexure  angles  does  not  change  the  hydrophobic  prop¬ 
erty  of  the  central  ring  significantly,  and  therefore  it  is  not  expected  to  affect  the 
interaction  between  the  drug  and  binding  site  directly.  The  effect  of  the  flexure 
angle  on  calcium  antagonistic  potency  may  be  due  to  the  specific  structural  re¬ 
quirement  of  binding  sites  to  the  ligand  or  to  the  interaction  with  other  molecules 
when  the  drug  molecule  approaches  the  receptor.  The  drug  affinity  of  these  tricyclic 
molecules  can  be  closely  related  to  their  ionization  energies;  the  drug  action  may 
require  donation  of  an  electron  from  the  drug  molecule  to  the  binding  site.  The 
calcium  ions  do  not  coordinate  or  interact  directly  with  the  central  ring  of  the 
model  drug  molecules,  but  rather  they  interact  through  the  water. 
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Abstract 

We  use  molecular  dynamics,  electrostatic,  and  quantumchemical  calculations  to  discuss  chromophore 
and  protein  structural  changes  as  well  as  proton  transfer  pathways  in  the  first  half  of  the  bacteriorhodopsin 
photocycle.  A  model  for  the  molecular  mechanism  is  presented,  which  accounts  for  the  complex  pH 
dependence  of  the  proton  release  and  uptake  pattern  found  for  the  M  intermediates.  The  results  suggest 
that  transient  transfer  of  the  Schiff  base  proton  to  a  nearby  tightly  bound  water  molecule  is  the  primary 
step,  which  is  accompanyied  by  dissipation  of  free  energy  to  the  protein.  From  there,  the  energetically 
most  favorable  proton  transfer  is  to  aspartate  D85.  Arginine  R82  is  involved  in  the  protein  reorientation 
switch,  which  catalyzes  the  pKa  reduction  of  glutamate  E204.  This  residue  is,  therefore,  identified  as 
extracellular  proton  release  group  whose  acid  base  equilibrium  regulates  the  pH-dependent  splitting  of 
the  photocycle.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Bacteriorhodopsin  (BR)  is  a  light-driven  proton-pumping  protein  in  the  cell 
membrane  of  Halobacterium  halobium.  The  photoinduced  isomerization  of  the 
Schiff  base  linked  retinal  chromophore  (rsb)  from  al \-trans  to  1 3 -cis  (for  recent 
reviews  see,  e.g.,  [1-3])  initiates  a  sequence  of  thermal  reactions  with  an  overall 
turnover  time  of  about  10  ms. 

The  photocycle  consists  of  spectroscopically  distinct  principal  intermediate  states, 
labeled  J,  K,  L,  M,  N,  and  O.  Several  models  account  for  the  complex  time  courses 
of  their  rise  and  decay,  which  introduce  either  parallel  photocycles  with  slightly 
different,  unidirectional  kinetic  steps  (e.g.,  [4-7])  or  one  photocycle  with  reversible 
reactions  between  the  intermediates  (e.g.,  [8,9]).  By  the  time  the  L  state  is  reached 
the  change  in  the  retinal  geometry  is  communicated  to  the  protein.  The  following 
proton  translocations  are  based  on  directed  sequential  alterations  in  the  pKa’s  of 
the  retinal  Schiff  base  and  strategically  located  titratable  residues.  Site  directed  mu¬ 
tagenesis  (e.g.,  [10-17])  have  revealed  several  key  amino  acids  which  are  involved 
in  proton  transfer  and  in  maintaining  the  functional  state  of  the  pigment.  These 
studies,  together  with  a  moderate  resolution  structure  for  BR  [18]  have  led  to  the 
conclusion,  that  the  isomerization  is  followed  by  transfer  of  the  RSB  proton  to 
aspartate  D85,  which  is  part  of  a  complex  counterion  near  the  extracellular  protein 


*  To  whom  correspondence  should  be  addressed. 
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surface,  consisting  of  two  aspartates  (D85,  D212),  an  arginine  (R82),  two  tyrosines 
(Y57,  Y185),  and  bound  water  molecules  [14,19,20]. 

During  the  lifetime  of  the  M  intermediate  with  the  deprotonated  retinal  SchifF 
base  two  kinetically  different  steps  occur:  One  is  a  protein  conformational  change 
which  guarantees  that  the  reaction  proceeds  mainly  in  the  forward  direction  (the 
“switch  step”  [8,21-23]);  the  other  step  involves  pH-dependent  proton  release  to 
the  extracellular  and  proton  uptake  from  the  cytoplasmic  side  of  the  membrane 
[24-26].  The  groups  which  accomplish  these  proton  exchange  reactions  at  the  surface 
must  have  suitable  pKa’s.  The  proton  release  complex  has  initially  a  high  pKa, 
which  is  decreased  during  the  first  half  of  the  photocycle.  The  pKa  of  this  group  in 
the  M  intermediate(s)  (determined  to  be  approx.  6  [24])  causes  the  diverging  of  the 
photocycle  into  two  pH-dependent  alternative  pathways:  If  the  pH  is  lower  than 
the  pKa  of  this  release  group,  proton  release  is  delayed  after  uptake.  At  higher  pH 
a  transient  proton  deficit  in  the  protein  develops. 

On  the  cytoplasmic  side  aspartate  D96  and  several  water  molecules  are  part  of 
the  proton  uptake  channel,  which  is  involved  in  the  reprotonation  of  the  SchifF 
base  in  the  M  ->  N  step  [1 1,27].  The  original  pKa  of  the  uptake  groups  is  near  1 1 
[26].  The  corresponding  difference  in  the  pKa’s  of  the  release  and  the  uptake  group 
is  enough  to  pump  protons  against  an  electrochemical  gradient  of  ^300  mV.  Proton 
uptake  from  the  bulk,  reprotonation  of  D96,  reisomerization  of  the  retinal  SchifF 
base  to  all -trans  and  protein  relaxations  reestablish  the  original  situation  [2,25,26], 
In  this  study  we  discuss  the  proton  release  pathway  in  the  bacteriorhodopsin 
photocycle.  The  various  aspects  of  the  proton  transfer  mechanism  are  investigated 
using  a  combination  of  different  theoretical  methods: 

•  The  retinal  isomerization  reaction,  accompanying  protein  structural  changes  and 
charge  separations,  is  discussed  by  means  of  quantum  chemical  and  molecular 
dynamic  calculations.  As  we  start  from  the  medium  resolution  structure  for  BR 
[18],  this  includes  the  completion  of  the  structural  information  due  to  hydration 
of  the  intrahelical  regions. 

•  General  electrostatic  concepts  allow  to  analyze  the  energetics  of  the  proton  pump 
in  terms  of  the  changed  molecular  structure  during  the  photocycle.  This  includes 
the  determination  of  the  pKa’s  for  different  sites  with  a  continuum  dielectric 
model  and  atomic  details.  The  calculations  include  pKa  shifts  due  to  the  electro¬ 
static  effects  of  burying  charged  groups  in  the  low  dielectric  membrane,  inter¬ 
actions  to  protein  and  water  dipoles,  and  the  coupling  of  titratable  residues  with 
each  other. 

We  present  a  sequence  of  isomerization  driven  protein  conformational  changes 
which  leads  to  a  description  of  the  proton  release  pathway  consistent  with  experi¬ 
mental  results.  In  particular,  we  discuss  the  role  of  a  bound  water  molecule  in  the 
counterion  complex  as  transient  proton  binding  site  and  its  involvement  in  the 
dissipation  of  free  energy  in  the  L  to  M  transition  as  well  as  the  molecular  nature 
of  the  reorientation  switch  during  the  lifetime  of  the  M  intermediates.  We  pro¬ 
pose  a  direct  involvement  of  arginine  R82  in  the  protein  reorientation  step  (Mi  — ► 
M2).  Its  movement  towards  the  extracellular  side  catalyzes  the  pKa  reduction  of  a 
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glutamate  residue  (E204),  which  acts  as  proton  release  group.  Our  calculations 
explain  the  complex  pH  dependence  of  the  proton  uptake  and  release  pattern  found 
for  the  M  intermediates  and  suggest  the  pH-dependent  splitting  of  the  photocycle 
at  the  stage  of  the  M2  intermediate. 

Methods 

Energetics  of  Proton  Transfer  Reactions 

The  use  of  general  electrostatic  concepts  [28-30]  allows  to  analyze  and  describe 
proton  transfer  reactions  in  terms  of  the  underlying  molecular  structure.  The  ener¬ 
getics  and  the  rates  can  be  estimated  from  the  actual  polarity  of  the  relevant  sites. 
A  certain  proton  configuration  (m)  of  the  protein  with  N  titratable  residues  will  be 
described  by  the  protonation  vector  x(m)  ~  (x\m\  . . . ,  xffl).  Proton  transfer  between 
two  residues  i  to  j  connects  two  protein  states,  e.g., 

(m)  =  (...,x<w)=  l,...,jcf>  =  0,. ..)-(») 

=  (...,*<n)  =  0, =  1,...). 

The  main  factor,  which  determines  the  activation  barrier  A G#  (and,  therefore,  the 
rate  constant),  is  the  corresponding  free  energy  difference  between  reactant  and 
product  state  AG(m  — ►  n).  This  free  energy  difference  can  be  correlated  with  the 
pKa  difference  for  proton  donor  and  acceptor: 

A G*  oc  A G(m  n)  ~  -23kT[^\j)  -  pK [m\i)]  =  AGPT  . 

The  free  energy  for  a  protonation  state  of  the  protein  at  a  given  pH  [30-32]  is 

N  N 

A G(m)  =  2  {x(m)(A(?int(/)  +  2.3^rpH)}  +  *  2  +  *f>) 

/=1  -  ij=\ 

with 

q°  =  charge  of  the  unprotonated  residue  (acid:  - 1 ;  base:  0), 

ACjint(/)  =  self-energy  of  residue  i  in  the  protein,  where  all  other  sites  are  in  their 
neutral  state  or  enter  as  fixed  background  charges, 

Wjj  =  interaction  between  titrating  sites,  work  required  to  charge  site  j  in  the 
presence  of  a  charge  on  site  i, 
k  =  Boltzmann’s  constant, 

T  =  temperature. 

The  self  energy  AGint(/)  can  be  further  divided  up  into  three  contributions: 

AGint(z)  =  AGrmodel(0  +  AA  (jsolvM  +  AA<jback(z) 

with 

AGmodeiW  =  protonation  free  energy  for  the  residue  in  aqueous  solvent  (as  reference 
state),  which  can  be  calculated  form  the  pKa  value  in  solvent:  A<jmOdei(0 
=  -2.3 kT  pKa,  modelOX 

AAGsolv(0  =  difference  in  reaction  field  energy  for  protonated  and  unprotonated 
residue  in  the  reference  state  and  the  protein. 
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AAGWkO)  =  difference  in  electrostatic  interaction  energy  for  protonated  and  un- 
protonated  residue  with  permanent  dipoles  and  constant  background 
charges  in  reference  state  and  protein. 

Changes  in  self  energy  and  site-site  couplings  shift  the  acid-base  equilibrium  for  a 
residue  in  a  given  protonation  state  of  the  protein,  which  is  conveniently  expressed 
as  the  actual  pKa  value  of  the  residue: 

N 

pK'm)(z)  =  pK, -  2  W./qJ  +  xf^jl.lkT  (j  #  i) 

j-1 

with  pKint(0  =  -AGUD/2MT. 


The  average  protonation  of  a  site  i  is  given  by  the  Boltzmann  weighted  sum  over 
all  2n  possible  protonation  states  of  the  protein  (m  =  1, .  . . ,  2^): 

<X>  =  2  x\m'  exp[-A G(m)/kT\  / 2  exp[-A G(m)/kT]  . 

m  I  m 

In  this  approximation  the  changes  in  protonation  free  energies  are  assumed  to 
result  only  from  a  change  in  the  enthalpy. 

Electrostatic  Calculations 

Calculations  of  reaction  field  energies  and  electrostatic  interactions  between 
charged  groups  were  carried  out  with  the  finite  difference  Poisson-Boltzmann  (fdpb) 
method  using  the  program  package  DelPhi  [31,33-35].  Partial  charges  and  van  der 
Waals  radii  for  the  amino  acids  were  taken  from  the  CHARMm2 1  parameter  set 
[36].  The  partial  charges  for  the  free  base  and  the  protonated  Schiff  base  retinal 
chromophore  were  calculated  with  the  program  package  MOPAC  6.0  using  the 
AMl  Hamiltonian  [37]. 

The  protein  and  the  membrane  region  were  treated  as  low  dielectric  cavity  (di¬ 
electric  constant  ein  =  4)  embedded  in  an  aqueous  medium  of  dielectric  constant 
eout  =  80  with  an  electrolyte  of  ionic  strength  0.15  M  and  an  ion  exclusion  radius 
of  2  A.  The  cytoplasmic  surface  of  the  membrane  is  presumed  to  be  even  with 
glutamic  acid  El 66,  the  extracellular  surface  is  even  with  E74.  This  results  in  a 
membrane  of  approx.  45  A  thickness  [38].  Twofold  focusing  resulted  in  a  final 
resolution  of  1 .4  grid/A.  Test  calculations  using  rotational  averaging  revealed  max¬ 
imal  errors  for  AAGbaCk  of  10%,  for  W{J  between  5%  and  10%. 

As  it  is  assumed  that  the  aqueous  medium  contains  mobile  ions,  our  calculations 
account  to  a  certain  degree  for  the  difference  between  bulk  pH  and  surface  pH. 
Due  to  the  omission  of  the  interhelical  loop  regions  and  the  membrane  lipid  head 
groups,  the  description  of  the  screening  will  be  incomplete. 

The  pKa  values  for  titratable  amino  acids  (asp,  glu,  lys,  arg,  tyr)  in  aqueous 
medium  were  taken  from  [39].  Water  and  R  —  OH  residues  (thr,  ser)  have  two 
states  of  protonation  and  correspondingly  two  pKa  values.  For  the  hydroxyl  form 
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(R  OH  ^  RCT  +  H+)  we  used  pKa(ROH)  =  15.7  [40],  for  the  hydronium  form 
(R  —  OHj  ^  ROH  +  H+)  we  used  pK^ROHf)  =  —1.7  [40].  The  possible  proton¬ 
ation  of  a  backbone  peptide  group  was  investigated  with  the  model  compound 
acetamide  (pKmodei  =  0.0  [40]).  The  pKmodei  value  used  for  the  all -trans  retinal 
Schiff  base  (7.0)  was  taken  from  Ref.  [32]. 

As  was  proposed  earlier  [41],  the  trans-cis  isomerization  of  the  Cl  3  =  04  bond 
induces  additional  twists  in  neighboring  single  bonds,  which  can  result  in  a  stereo¬ 
chemical  pKa  reduction.  To  determine  these  pKa  shifts  for  the  1 3 -cis  intermediates, 
we  calculated  quantum  chemically  (MOPAC  6.0  [37],  AMl  Hamiltonian,  lowest 
100  singly  and  doubly  excited  configurations)  the  total  energies  for  the  isolated 
neutral  and  protonated  chromophore,  whose  overall  geometry  was  kept  near  the 
value  found  in  the  energy  minimized  protein  structure  (see  below)  by  constraining 
the  dihedrals.  The  energy  differences  relative  to  the  all -trans  chromophores  enter 
as  modifies  pKmodd  values  for  the  rsb. 

Preparation  of  the  Bacteriorhodopsin  Structure 

The  medium  resolution  structure  for  bacteriorhodopsin,  determined  by  electron 
cryomicroscopy  [18],  has  been  taken  as  starting  point  for  our  investigation  of  bac¬ 
teriorhodopsin  and  its  intermediates.  Energy  minimization  and  molecular  dynamics 
calculations  were  carried  out  with  the  CHARMm  force  field  ([36],  version  21  pa¬ 
rameter  set,  dielectric  constant  e  =  1).  Coordinates  for  polar  hydrogen  atoms  were 
built  using  the  HBUILD  facility  of  CHARMm.  Partial  charges  for  the  Schiff  base 
have  been  determined  using  the  program  package  MOPAC  6.0  [37].  The  torsional 
and  bond  stretch  force  constants  and  equilibrium  values  for  the  polyenlike,  unpro- 
tonated  retinal  were  taken  from  the  CHARMm  parameters.  For  the  protonated 
retinal  Schiff  base,  these  values  were  scaled  to  reflect  the  quantum  chemical  deter¬ 
mined  [41]  bond  features. 

The  protonation  states  for  titratable  groups  which  participate  in  proton  transfer 
were  taken  to  be  in  accordance  with  experimental  observations  (e.g.,  [10-17])  or  a 
recent  theoretical  calculation  of  pKa  values  [32] — protonated:  RSB,  D115,  D96, 
Y57,  Y185,  T89,  R82,  E204;  deprotonated:  D85,  D212).  The  ionization  state  for 
other  titratable  residues  was  selected  according  to  their  protonation  at  pH  7.  The 
electrostatic  calculation  [32]  also  supports  the  repositioning  of  R82  from  the  ex¬ 
tracellular  side  to  the  interior  of  the  protein. 

The  structure  of  Henderson  et  al.  [  1 8]  describes  only  the  helical  part  of  bacter¬ 
iorhodopsin;  no  coordinates  for  the  atoms  in  the  interhelical  loop  regions  are  pro¬ 
vided.  We  refrained  from  the  explicit  construction  of  the  missing  loop  regions  and 
adopted  heavy  patch  residues  for  the  terminal  groups  of  the  helices.  The  integrity 
of  the  protein  was  maintained  by  applying  harmonic  constraints,  which  increased 
continuously  from  an  inner  10  A  radius  cyclindrical  region  without  constraints  to 
the  outer  region  [force  constant  0.2  kcal/(mol  A2)].  This  selection  of  harmonic 
restoring  forces  simulates  the  protein-membrane  interaction  and  ensures  that  all 
atoms  fluctuate  approximate  with  the  same  mean  amplitude.  Several  molecular 
dynamics  calculations  [42,43]  revealed  that  the  Henderson  structure  [18]  is  close 
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to  a  minimum  energy  conformation.  The  application  of  constraints  to  keep  the 
outer  surface  of  the  protein  intact  imposes,  therefore,  no  unphysical  restrictions. 

The  positions  of  intramembrane  water  molecules  (tip3  water  model  [44])  were 
carefully  inspected:  To  use  the  experimental  information  about  water  cavities  in 
the  native  state,  no  vacuum  geometry  optimization  was  performed.  In  a  first  step, 
each  residue  with  a  solvent  accessible  surface  >  10  A2  was  surrounded  by  an  equil¬ 
ibrated  8  A  water  sphere.  Water  molecules  with  bad  contacts  (<2.8  A  distance  to 
protein  heavy  atoms)  were  eliminated.  The  positions  of  the  solvent  molecules  were 
optimized  relative  to  fixed  protein  atoms.  The  following  short  (3  ps)  molecular 
dynamics  simulation  (rapid  heating  from  0  K  to  300  K)  was  used  to  create  a  random 
set  of  conformations  as  input  for  the  geometry  optimization  algorithm  utilizing  the 
threshold  accepting  principle  [45].  This  algorithm  allows  large  displacements  of  the 
atoms  from  the  starting  configurations  and  supports  the  inspection  of  a  large  region 
of  the  potential  energy  surface  and  leads,  therefore,  to  a  structure  near  to  the  global 
minimum.  Fourfold  application  of  this  solvation-optimization  procedure  led  to  a 
water  saturated  structure.  Subsequent  optimization  of  solvent  and  protein  and  a 
160  ps  molecular  dynamic  calculation  ( 10  ps  heating  to  300  K,  50  ps  equilibration, 
100  ps  free  dynamics  at  300  K,  graduated  harmonic  constraints  applied)  helps  to 
identify  tightly  bound  water  molecules. 

Generation  of  Intermediates 

Experimental  information  about  the  light-induced  changes  in  the  chromophore 
geometry  (e.g.,  [2,3,21])  argues  for  an  all -trans  to  13 -cis  isomerization  as  primary 
photochemical  event.  The  potential  which  governs  the  motion  of  a  dihedral  angle 
(p  in  CHARMm  is  Vv  -  KJi  1  +  cos(mp  +  8)]  with  n  -  2  (periodicity)  and  8  =  180 
(phase  factor)  in  the  case  of  the  planar,  all -trans  retinal  chain  conformation.  Quan¬ 
tum  chemical  [3,46]  calculations  and  fs  time  resolved  spectroscopy  [2,3]  suggest 
that  the  excited  state  potential  minimum  is  near  the  ground  state  isomerization 
barrier.  We  use  the  information  from  these  theoretical  calculations  to  construct  an 
approximate  SI -potential  surface  for  the  03— 04  dihedral  in  the  retinal  SchifF 
base:  V*  =  K*[\  +  cos(>2<p)]  with  K*  =  5.6  kcal/mol  and  the  periodicity  n  =  2. 
The  phase  factor  5  =  0  shifts  the  SI -equilibrium  positions  to  ±90°.  The  excited 
state  topology  was  completed  with  a  new  set  of  partial  charges  for  the  protonated 
retinal  resulting  from  a  quantum  chemical  calculation  (indo-SDTCI  [47]).  The  pho¬ 
toisomerization  was  induced  by  abruptly  changing  the  Cl 3 =C  14  torsional  po¬ 
tential  from  SO-  to  SI -topology  and  placing  therefore  the  bond  to  the  excited  state 
maximum.  A  short  molecular  dynamics  calculation  monitored  the  relaxation  of 
the  chromophore  away  from  the  Franck-Condon  region  to  the  new  equilibrium 
position,  which  was  completed  in  270  fs — a  value  also  found  experimentally  [2]. 
The  isomerization  direction  is  determined  by  the  interaction  of  the  chromophore 
with  the  surrounding  protein.  Strong  electrostatic  interaction  between  the  SchifF 
base  proton  and  the  negative  partial  charge  of  the  main  chain  oxygen  at  position 
212  in  helix  G  favors  the  twist  from  180°  to  -90°. 

To  enforce  the  completion  of  the  all-trans  -►  13 -cis  isomerization  on  the  ground 
state  potential  surface,  we  assumed  the  storage  of  50  kcal/mol  of  torsional  energy 


PROTON  RELEASE  PATHWAY  IN  BACTERIORHODOPSIN 


39 


in  the  Cl  3  =  Cl  4  bond  and  varied  the  equilibrium  position  of  the  dihedral  angle 
in  30°  steps  to  the  final  13 -cis  value  (0°).  As  the  isomerization  time  is  short  [2] 
compared  to  protein  relaxation  times,  the  equalized  geometry  of  the  K  intermediate 
was  optained  with  a  frozen  protein.  Further  relaxation  of  the  1 3 -cis  chromophore 
transfers  energy  from  the  reaction  coordinate  to  other  degrees  of  freedom.  Subse¬ 
quent  protein  structural  changes  lead  to  the  L  intermediate.  Different  molecular 
models  were  discussed  for  the  proton  transfer  and  the  M  intermediates. 

To  test  the  plausibility  of  the  generated  structures,  we  performed  quantum  chem¬ 
ical  calculations  of  the  absorption  spectra  using  the  INDO-SDTCI  approach  [47]. 
The  inclusion  of  singly,  doubly,  and  triply  excited  electron  configurations  in  the 
configuration  interaction  procedure  accounts  for  the  polarizabilities  in  the  ground 
and  excited  states  which  are  essential  for  the  electronic  structure  of  the  chromophore. 
The  retinal,  charged,  and  polar  groups  (including  water)  in  a  10  A  surrounding 
spherical  region  were  treated  as  supermolecule.  This  approach  allows  one  to  consider 
the  charge  induced  wavelength  shifts.  The  calculated  values  (Xmax/nm;  dipole 
strength/Debye2)  are  BR(547;  145)  -  J(625;  54)  -  L(524;  148). 

Results  and  Discussion 

The  BR  Ground  State 

Structure:  As  the  Henderson  structure  [18]  contains  only  detailed  information 
about  the  location  of  the  /5-ionone  ring,  the  position  of  the  conjugated  portion  of 
the  chromophore  has  to  be  worked  out.  It  is  determined  by  the  steric  strain  exerted 
from  the  tightly  packed  residues  in  the  vicinity  (tryptophanes  W86,  W182,  W189, 
proline  PI 86)  and  the  electrostatic  interactions  between  the  protonated  Schiff  base 
and  anionic  (aspartates  D85,  D212),  cationic  (arginine  R82)  or  polar  (tyrosines 
Y57,  Y185  and  water)  groups.  The  equilibrated  dX\-trans  chromophore  is  almost 
planar  except  a  slight  twist  of  the  C 14  —  Cl  5  bond  (dihedral  -165°).  Small  twist 
around  single  bonds  were  also  found  experimentally  [48]. 

It  is  well  established  [14,20]  that  the  primary  proton  acceptor  in  the  first  half  of 
the  photocycle  (D85)  is  part  of  a  complex  counterion  system  that  consists  of  the 
aspartate  residues  D85,  D212,  arginine  R82,  the  tyrosines  Y57  and  Y185,  and 
probably  a  tightly  bound  water  molecule  [14,19].  The  quadrupolelike  arrangement 
of  RSB,  R82,  D85,  and  D212,  which  was  proposed  by  nmr  spectroscopy  [20],  is 
a  constituent  part  of  our  BR  structure  (see  Fig.  1).  The  direct  counterion  for  the 
protonated  RSB  is  D2 12  which  is  bound  to  the  same  helix  as  the  RSB  and  is,  therefore, 
constrained  to  stay  in  its  vicinity.  Compared  to  the  original  structure,  we  find  a 
displacement  of  D85,  which  moves  2  A  away  from  the  Schiff  base,  due  to  electrostatic 
repulsion  between  the  two  aspartic  acids.  The  gap  is  filled  by  a  tightly  bound  water 
molecule  (named  X 1  in  the  structure),  which  can  now  serve  as  direct  counterion 
in  the  proton  release  pathway. 

It  is  experimentally  established  [12,49,50],  that  water  molecules  are  functionally 
important,  especially  in  the  uptake  of  the  proton  from  the  cytoplasm.  Water  struc¬ 
tural  changes  during  the  photocycle  have  been  observed  [51].  Neutron  diffraction 
investigations  [52]  suggest  that  there  are  at  least  four  tightly  bound  water  molecules 
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Figure  1 .  BR  intermediate — stereoview  of  retinal  and  functionally  important  residues  in 
the  transmembrane  region  (amino  acids  are  labeled  by  their  one-letter  code,  water  molecules 

are  labeled  by  X). 


near  the  Schiff  base  and  several  exchangeable  protons  participating  in  the  pathway 
across  the  membrane.  Our  BR  structure  shows  two  water  chains  in  the  proposed 
proton  conduction  channel  on  the  cytoplasmic  side,  which  may  serve  as  proton 
wires  for  the  transfer  to  the  Schiff  base.  One  spans  the  region  from  aspartate  D96 
over  threonine  T89 — two  residues  with  an  essential  role  in  proton  uptake 
[1 1,12,16,54] — to  another  tightly  bound  water  (named  X2  in  Fig.  1)  and  aspartate 
D85,  the  primary  proton  acceptor  in  the  release  pathway.  The  other  chain  is  in¬ 
terrupted  by  a  hydrophobic  region  near  phenylalanine  F219  and  ends  near  the 
tyrosines  Y57  and  Y 185.  Recent  experimental  results  [50]  indicate  that  at  least  15 
mol  of  water  are  required  for  optimal  proton  transfer  to  and  from  the  Schiff  base. 
This  is  a  much  larger  amount  than  needed  for  a  single  chain  of  water  molecules 
between  RSB  and  D96.  It  probably  represents  diffuse  hydration  of  the  interhelical 
region  on  the  cytoplasmic  side  of  the  protein  in  spite  of  the  hydophobic  nature  of 
the  proton  channel  itself. 

Acid-Base  Equilibria :  The  usefulness  and  limitations  of  pKa  calculations  for  the 
BR  ground  state  based  on  the  medium  resolution  structure  [18]  was  already  shown 
[32].  Our  calculations  differ  by  this  investigation  through  the  inclusion  of  water 
molecules  in  the  structure,  which  appear  either  as  permanent  dipoles  in  the  back¬ 
ground  term  or  are  treated  explicitly  as  titrating  groups,  which  can  accept  or  donate 
protons  [35]. 
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We  include  only  a  subset  of  titratable  groups  in  our  calculations:  Besides  the  RSB, 
we  regard  members  of  the  counterion  complex  (D85,  D212,  R82,  Y57,  Y185,  water 
X 1 ),  four  groups  in  the  uptake  channel  (D96,  T89,  water  X2,  and  D 1 1 5),  and  E204 
in  the  extracellular  region.  As  our  Cl 3 =04  isomerization  model  revealed  strong 
electrostatic  interaction  between  the  Schiff  base  proton  and  a  main  chain  oxygen, 
we  examined  the  importance  of  this  peptide  group  as  transient  proton  acceptor. 
All  other  ionizable  groups  were  included  in  a  fixed  protonation  state  (corresponding 
to  bulk  pH  7),  assuming,  that  there  is  no  large  pKa  shift  compared  to  the  solvent 
value.  As  a  consequence,  our  titration  calculations  are  reliable  only  in  the  pH  range 
from  3  (protonation  of  carboxylates)  to  1 1  (deprotonation  of  tyrosines  and  lysines). 

Table  I  shows  the  contributions  of  solvation  and  background  interaction  term 
to  pKa  shifts  in  the  BR  ground  state.  As  found  for  other  membrane  systems 
[32,35,53],  the  pKint  value  is  determined  by  the  desolvation  penalty  for  burying 
charged  groups  in  the  low  dielectric  inner  membrane  region  and  favors  the  neutral 
protonation  state.  The  shift  is  more  pronounced  for  groups  with  a  localized  charge 
distribution  (water,  tyrosine,  threonine)  while  it  is  less  important  for  residues  with 
largely  delocalized  charge  distributions  (especially  arginine  and  the  Schiff  base). 
The  interaction  with  nontitratable  residues  and  protein  dipoles  shows  the  tendency 
to  reduces  the  unfavorable  shifts  due  to  desolvation  for  members  of  the  counterion 
complex.  Although  D96  and  E204  are  surrounded  by  water  molecules,  the  back¬ 
ground  term  calculated  for  these  residues  does  not  compensate  the  unfavorable 
solvation  term.  The  solvent  molecules  around  these  residues  are  arranged  according 
to  the  neutral  form  of  the  aspartate  or  glutamate  groups.  As  for  the  electrostatic 
calculation  the  orientation  of  background  dipoles  had  to  be  kept  fix,  our  calculations 
do  not  account  for  the  stabilization  of  the  anionic  charge  form  due  to  a  changed 
orientation  of  the  water  dipoles.  For  example,  the  magnitude  of  the  unfavorable 


Table  I.  Contributions  of  solvation  and  background  terms  to  the  pKa  shift  in  the  energy  minimized 
structure  of  bacteriorhodopsin  (BR). 


pK-model 

ApKsolv 

ApK-back 

<*>■> 

pKa 

RSB 

7.0 

-4.3 

2.7[3.5] 

1 

15.2 

D212 

3.9 

11.0 

-8.5 

0 

-3.8 

D85 

3.9 

11.2 

— 2.9[— 0.9] 

0 

-2.3 

R82 

12.5 

-10.1 

0.9[ — 4.9] 

1 

26.1  [2 1.7] 

XI 

-1.7 

-19.0 

-0.4 

0 

-10.5 

Y57 

10.1 

14.6 

-4.0 

1 

34.2 

Y185 

10.1 

14.6 

0.9[1.6] 

1 

30.2 

X2 

-1.7 

-19.0 

1.1 [1.7] 

0 

-15.4 

T89 

15.7 

14.1 

01  [11] 

1 

32.3 

E204 

4.3 

11.4 

4.7 

1 

20.7 

D96 

3.9 

11.7 

3.2 

1 

18.6 

Included  are  the  average  degree  of  protonation  (x,)  of  each  residue  and  the  final  pKa-value  in  the 
ground  state  configuration.  Values  in  square  brackets  are  for  the  od1-hd  tautomer  for  D85. 
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contribution  of  the  water  molecules  to  the  charged  form  of  E204  is  approximately 
+8  pKa  units. 

Table  II  shows  the  site-site  coupling  for  members  of  the  counterion  complex. 
Due  to  its  extended,  alternating  charge  distribution  the  interactions  between  the 
protonated  SchifF  base  and  members  of  the  counterion  complex  is  balanced  out. 
The  largest  couplings  are  local  pairwise  interactions  between  R82-D85,  D212- 
Y57,  and  X2-T89.  For  the  carboxylate  and  guanidyl  groups  of  D85,  D212,  and 
R82,  we  encounter  the  problem  of  hydrogen-tautomerism:  Which  oxygen  should 
receive  the  proton,  which  nitrogen  should  be  the  donor?  In  contrast  to  other  attempts 
[32],  which  neutralize  or  distribute  the  charges  symmetrically,  we  discuss  both 
possibilities  explicitly  for  the  important  residue  D85.  The  two  tautomers  correspond 
to  the  two  physical  situations,  where  the  interaction  to  arginine  R82  (odI-hd  tau¬ 
tomer,  included  in  square  brackets  in  Tables  I  and  II)  or  the  interaction  to  the 
Schiffbase  rsb  and  the  water  molecules  X2  and  XI  dominates  (od2-hd  tautomer). 
The  same  differentiation  holds  for  D212:  The  odI-hd  tautomer  (adopted  for  the 
calculation)  optimizes  the  interaction  to  Y57  and  the  RSB,  the  other  tautomer 
to  Y185. 

To  gain  some  insight  into  the  effect  of  structural  fluctuations  on  the  results,  we 
calculated  the  pairwise  electrostatic  interactions  for  the  2800  frames  of  our  100  ps 
molecular  dynamics  run.  This  is  facilitated  by  the  fact  that  the  screening  of  elec¬ 
trostatic  interactions  due  to  induced  surface  charges  is  not  prominent  in  this  mem¬ 
brane  bound  protein.  Therefore,  we  determined  the  effective  dielectric  constant  cefr 
for  the  interaction  term  Wu  in  the  static  structure  as  described  earlier  [34]: 

*-efF  —  FF i  ji^e  1,  Coulomb)/  Witj(em ,  c0uts  FDPB)  . 

Wij(e in,  eout,  fdpb)  is  the  interaction  term  calculated  with  the  fdpb  method,  where 
the  membrane  is  treated  as  low  dielectric  region  surrounded  by  the  high  dielectric 
solvent,  Wij(e  =  1 ,  Coulomb)  is  the  corresponding  value  calculated  with  Coulomb’s 
law  and  a  constant  dielectric  constant.  The  eeff  values  vary  between  0.9*ein  and 
1.5*ein  and  indicate  no  large  dielectric  inhomogeneities  in  the  protein.  From  the 
time  series  for  the  Wq{c  -  1,  Coulomb)  values,  mean  values,  and  standard  deviations 
can  be  extracted  (lower  lines  in  Table  II).  The  largest  fluctuations  in  the  order  of 
1-2  pH  units  are  found  for  the  OH  groups  of  water,  tyrosine,  and  threonine  residues. 
No  value  for  the  static  structure  lies  outside  the  distribution.  It  becomes  also  clear 
that  the  strong  interaction  of  D85  and  R82  dominates  over  the  interaction  between 
D85  and  XI.  Fluctuations  in  the  background  term  ApKback,  calculated  with  the 
same  approach,  are  approximately  1  pK  unit  for  residues  in  the  active  site  and  2 
pK  units  for  the  groups  of  E204  and  D96. 

The  calculated  average  degrees  of  protonation  (xi)  for  the  residues  (Table  I) 
confirm  the  experimental  determined  protonation  states  and  are  constant  over  the 
pH  range  from  3  to  11.  The  pKa  values  for  the  residues  in  this  ground-state  con¬ 
figuration  are  also  included  in  Table  I,  Their  maximal  uncertainties  are  ±3  pK 
units.  The  available  corresponding  experimental  values  are:  13.2  (RSB  [55]),  <2.5 
(D85,  D212  [10,56,57]),  >12  (Y57,  Y185  [32]),  13.6  (R82  [15]),  and  >11  (D96 
[26]).  The  general  overestimation  of  pKa  shifts  by  the  applied  method  has  its  origin 


Table  II.  Site-site  coupling  in  bacteriorhodopsin. 
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in  the  forced  constant  geometry  for  the  charged  and  neutral  group,  which  prevents 
the  residue  from  changing  position  in  the  protein  to  optimize  the  electrostatic  in¬ 
teraction  [32,31]. 

As  in  our  approximation  pKa  shifts  are  only  due  to  mutual  Coulombic  interactions 
of  the  groups,  the  site-site  coupling  values  W&  (Table  II)  give  the  contribution  for 
each  residue  /  due  to  the  pairwise  interaction  to  the  other  sites  j.  For  the  rsb  we 
find  that  the  positive  charge  on  R82  lowers  the  pKa  by  5  units,  the  negative  charges 
on  D85  and  D212  raise  the  value  by  7  and  9  units,  respectively.  This  means  that 
both  aspartate  residues  are  necessary  to  hold  the  retinal  in  its  protonated  form  over 
a  large  pH  range.  The  corresponding  experimental  values  from  mutation  experi¬ 
ments  [15]  are  a  reduction  by  2.5  units  due  to  R82  and  an  increase  by  5  units  due 
to  D85.  The  dominant  contribution  in  lowering  the  pKa  for  D85  is  the  positive 
charge  on  R82,  as  the  rise  due  to  the  negative  D2 12  compensates  widely  the  reduction 
due  to  the  positive  RSB.  The  corresponding  mutation  experiment  [15]  reveals  a 
reduction  by  4.5  units  due  to  R82. 

Under  the  assumption  of  a  fixed  geometry,  it  is  also  possible  to  analyze  how  the 
proton  affinities  change  if  some  of  the  groups  are  neutralized.  Switching  from  the 
ground  state  proton  configuration  (m)  =  (. . . ,  xt  =  0(1),  . . .)  to  a  configuration 
(m')  =  (...,  Xi  =  1(0),  . . .)  shifts  the  pKa’s  for  the  other  residues  proportional  to 
the  change  in  electrostatic  potential  due  to  group  i.  These  values  can  be  compared 
to  the  results  of  mutation  experiments  (e.g.,  [15,17]).  For  the  situation  where  R82 
is  neutralized  (e.g.,  R82Q  mutant),  the  removal  of  a  positive  charge  leads  to  an 
increase  of  the  pKa  values  for  the  RSB  (+5  units),  D85  (+16  units),  and  D212  (+1 1 
units).  The  considerably  lower  experimental  value  for  aspartate  D85  (+4.5  units 
[15,17])  is  a  clear  hint  for  severe  structural  reorientations  occurring  upon  this  mu¬ 
tation.  If  both  D85  and  R82  are  uncharged,  the  calculated  pKa  reduction  for  the 
RSB  by  2  units  fits  to  the  experimentally  found  reduction  from  13.2  to  10.6  [17]. 
If  R82  is  charged  and  D85  is  neutral,  the  spectroscopic  titration  experiment  finds 
a  reduction  of  pKa  (rsb)  from  1 3.2  to  8.4  and  the  appearance  of  a  second  component 
with  an  apparent  pKa  of  10.6  [  1 5].  Our  results  indicate  that  this  charge  distribution 
in  the  protein  is  coupled  to  a  drastic  reduction  in  pKa  for  the  rsb  (-7.2  units), 
D212  (-9  units),  T89  (-10  units),  D115  (-5  units),  and  R82  (-10  units).  The 
second  component  could,  therefore,  be  due  to  the  deprotonation  of  R82  or  T89  in 
the  pH  range  around  10. 

Ground-State  Charge  Heterogeneities:  Deviations  from  the  BR  ground-state  pro¬ 
ton  distribution  appear  at  extreme  low  pH  values,  where  either  D85  or  D212  pro- 
tonate.  This  is  experimentally  established  by  the  purple-blue  transition  of  the 
membrane  and  the  blocking  of  the  proton  pump  activity,  which  was  attributed  to 
protonation  of  D85  at  acidic  pH  [14,56,57]. 

Small  shifts  and  amplitude  changes  in  the  spectrum  of  the  unphotolyzed  wild- 
type  bacteriorhodopsin  with  increasing  pH  above  8  suggest  some  pH-dependent 
heterogeneity  [58].  There  are  at  least  three  additional  states:  a  blue  shifted  (Amax 
480  nm)  all -trans  component,  a  N-like  species  possibly  with  deprotonated  D96 
[59]  and  a  red-shifted  species  which  is  attributed  to  the  ionization  of  a  tyrosine 
residue  [60].  Two  different  BR  states  ( a ,  (3  in  a  pH-dependent  population  ratio) 
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which  are  different  with  respect  to  the  conformation  (ionization)  of  at  least  one 
specific  side  chain  were  postulated  from  kinetics  experiments  [4-7].  It  was  shown 
that  these  components  undergo  different  photocycles:  In  the  a-state  of  BR,  proton 
transfer  from  the  Schiff  base  is  catalyzed,  while  the  /3-state  does  not  support  or  even 
impedes  this  reaction  step. 

The  discussion  of  heterogeneous  charge  distributions  in  bacteriorhodopsin  is 
possible,  if  we  regard  the  proton  configurations,  which  are  energetically  near  the 
ground-state  configuration.  One  obvious  origin  of  a  charge  heterogeneity  near  the 
chromophore  is  the  formation  of  a  salt  bridge  between  R82  and  D85,  which  needs 
ca.  7  kcal/mol  under  the  assumption  of  optimal  orientation  (od1-hd  tautomer  of 
D85).  If  this  salt  bridge  is  maintained  during  the  isomerization,  the  preferential 
proton  acceptor  site  will  be  blocked  for  a  subset  of  bacteriorhodopsin  molecules. 
In  contrast  to  the  protonation  of  D85  at  low  pH,  which  shifts  the  absorption  wave¬ 
length  35  nm  bathochromically,  the  salt  bridge  would  induce  only  a  small  wavelength 
shift  as  judged  from  the  10  nm  red  shift  found  for  the  double  mutant  R82A/D85N 
[15].  The  role  of  R82  in  the  ground  state  heterogeneity  was  experimentally  estab¬ 
lished  [17]  by  the  fact  that  for  the  R82A  mutant  the  redshift  of  the  chromophore 
absorption  found  for  wild  type  bacteriorhodopsin  around  pH  9  did  not  occur.  This 
investigation  [17]  revealed  also  the  involvement  of  a  transient  protonation  of  D85 
on  the  rate  of  thermal  trans-cis  isomerization  of  the  retinal  chromophore.  The 
changed  charge  distribution  near  the  chromophore  will,  therefore,  induce  some 
structural  heterogeneity,  too.  The  other  type  of  heterogeneity,  initiated  by  the  pH 
catalyzed  deprotonation  of  a  residue,  can  be  discussed  in  terms  of  the  response  of 
the  other  residues  to  the  ejection  of  a  proton:  the  deprotonation  of  D1 15  in  BR, 
e.g.,  changes  in  proton  affinities  for  the  residues  in  the  active  site  according  to  the 
individual  pairwise  interactions  JFy(ApKa  «  +3.5  units). 

If  the  acid-base  equilibration  of  the  amino  acid  occurs  faster  than  the  formation 
of  all  of  the  photocycle  intermediates,  only  one  cycle  should  be  observed  [7]  and 
the  inhomogeneity  will  be  averaged  out  in  the  kinetics.  However,  more  than  one 
parallel  cycle  is  expected  if  some  of  the  residues  that  affect  the  rate  of  formation  of 
an  intermediate  have  protonation/deprotonation  rates  on  a  comparable  or  slower 
time  scale.  From  the  definition  of  the  acid-base  equilibrium  constant  Ka  = 
10“pKa  =  fcdiSSOC/kassoc  the  deprotonation  time  can  be  estimated  [7].  Assuming  that 
the  proton  association  with  a  pKa  value  between  2  and  9  is  diffusion  controlled 
with  /cassoc  =  101 1 1VT1  s-1  in  aqueous  medium  ( 106  M_1  s-1  in  nonaqueous  systems), 
one  calculates  deprotonation  times  that  range  from  10-9  s  to  103  s. 

The  L  Intermediate 

For  the  discussion  of  molecular  mechanisms  for  proton  transfer  reactions  in 
bacteriorhodopsin,  the  L  structure  is  the  key  intermediate.  The  proton  transport  is 
based  on  directed  changes  in  the  pKa’s  of  the  Schiff  base  and  specific  groups  in  the 
protein.  In  the  L  species  all,  or  nearly  all,  of  the  acquired  free  energy  (experimental 
values  vary  between  10  kcal/mol  (calorimetric  measurements  [61])  and  23  kcal/ 
mol  (photoacustic  data  [62]))  will  be  manifested  in  the  changed  pKa  values  of  the 
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groups,  which  participate  in  the  proton  transfer.  The  changes  in  pKa  can  be  accom¬ 
plished  by  several  conditions:  (A)  The  distortion  of  the  retinal  geometry  could 
decrease  the  pKa  of  the  RSB  [41].  (B)  A  reduction  in  its  pKa  can  be  the  result  of  a 
change  in  the  strength  and  geometry  of  hydrogen  bonds.  (C)  Changed  electrostatic 
interactions,  e.g.,  repulsion  to  positively  charged  residues  and  inhomogeneous  elec¬ 
tric  fields  in  the  binding  pocket  [7]  induce  a  high  sensitivity  of  the  electrostatic 
interactions  to  even  small  protein  conformations  changes  during  the  photocycle. 

Structure:  The  structure  of  the  L  intermediate  is  shown  in  Figure  2.  The  relaxed 
13-c/s-chromophore  exhibits  large-to-moderate  twists  in  the  dihedrals  neighboring 
the  isomerized  03=04  bond  (C 1 2  —  C 1 3 :  153°,  C14— C15: 170°,C15=NZ: 
157°),  which  indicates  that  the  torsional  strain  caused  by  the  double  bond  isom¬ 
erization  is  not  localized  to  a  particular  neighboring  bond  as  it  is  assumed  by  di- 
cis  models  (e.g.,  [63]).  The  introduction  of  twists  in  the  retinal  moiety  close  to  the 
Schiffbase  was  also  characterized  by  FTIR  spectroscopy  [48].  These  twists  give  rise 
to  a  calculated  stereochemical  pKa  reduction  for  the  retinal  chromophore  by  2  pK 
units,  which  enters  as  reduced  pKmodei  =  5  for  the  13-czs-chromophore,  compared 
to  pKmodei  =  7  for  the  all -trans  retinal. 

The  accompanying  reorientation  of  the  Schiffbase  proton  towards  helix  G  disrupts 
the  approximate  quadrupolelike  arrangement  of  RSB,  D85,  D212,  and  R82  which 
controls  the  electrostatic  interactions  in  the  BR  ground  state.  The  dominant  inter¬ 
actions  of  the  protonated  RSB  in  the  L  structure  are  strong  hydrogen  bonds  to  the 
D2 12  peptide  and  carboxylate  group  (nearest  distance  to  NZ  3.0  A  and  3.4  A,  resp.). 
The  large  attraction  of  the  Schiffbase  proton  to  the  D212  main  chain  oxygen  even 
determines  the  isomerization  direction  along  the  excited  state  potential  surface  (see 


Figure  2.  L  intermediate — stereoview  of  retinal  and  functionally  important  residues  in 
the  proton  release  pathway. 
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methods).  An  increased  hydrogen-bond  strength  of  the  Schiff  base  was  also  found 
experimentally  by  Raman  spectroscopy  [2,21].  The  isomerization  induced  strain 
in  the  binding  pocket  manifests  itself  experimentally  by  backbone  conformational 
changes  which  give  rise  to  the  changed  pattern  in  the  amide  I/II-bands  in  ftir 
difference  spectra  (e.g.,  [9,22]),  which  appears  even  if  the  rsb  will  not  be  depro- 
tonated  in  the  following  step.  Another  prominent  change  occurring  upon  L  for¬ 
mation  is  the  observed  stronger  hydrogen  bonding  of  water  and  O  —  H  side  chains 
of  the  protein  (e.g.,  threonine,  tyrosine)  [27,51,54]. 

The  deformation  of  the  binding  pocket  upon  isomerization  induces  the  movement 
of  threonine  T89  (on  helix  C)  and  tyrosine  Y57  (on  helix  B)  against  each  other.  As 
a  consequence,  the  electrostatic  coupling  of  residues  in  the  interconnecting  gap 
increases  (e.g.,  Y57-X1,  Y57-X2,  Y57-T89,  but  also  D85-X1,  D85-X2)  while 
the  interaction  between  R82  and  D85  is  reduced.  These  local  alterations  in  the 
region  of  helices  B  and  C  near  the  binding  site  are  possible  functionally  important 
motions,  which  influence  the  following  reaction  steps.  The  new  interaction  pattern 
between  D85,  XI,  X2,  and  T89  resembles  a  tightened  spring,  spanning  the  region 
from  the  retinal  Schiff  base  to  residue  T89  in  the  lower  part  of  the  proton  uptake 
channel.  From  there,  the  information  about  the  changes  in  the  active  site  can  be 
transmitted  over  three  to  four  additional  water  molecules  to  aspartate  D96 
[11,12,50]. 

Acid-Base  Equilibria:  The  protonation  states  of  the  titratable  residues  found  for 
the  BR  ground  state  are  conserved  for  the  ground  state  of  the  L  intermediate.  The 
changes  in  electrostatic  interactions  are  converted  to  pKa  changes  (Table  III).  Ad¬ 
ditional  to  the  2  units  of  stereochemical  reduction,  the  pKa  of  the  retinal  Schiff 
base  is  further  decreased  by  3.5  units.  At  the  same  time,  the  pKa  of  D85  increases 


Table  III.  Proton  distribution  [(•)  protonated,  (O)  unprotonated  residue]  and  changes  in  pKa  values 
for  key  residues  during  the  first  steps  in  the  photocycle  of  bacteriorhodopsin:  (a)  ground  state;  (b)  reference 
state. 


Mj  M2 


(a) 

BR 

L 

pH  <  6.2 

pH  >  6.2 

pH  <6 

6  <  pH  <  8.5 

pH  >  8.5 

RSB 

15.2  • 

11.7  • 

6.2  • 

6.2  O 

6.0  • 

6.0  O 

8.5  • 

8.5  O 

D212 

-3.8  O 

-0.5  O 

-11.3  O 

-3.5  O 

-5.3  O 

2.6  O 

-1.8  O 

6.1  O 

D85 

-2.3  O 

2.8  O 

11.5  • 

17.7  • 

14.7  • 

21.1  • 

17.5  • 

23.9  • 

XI 

-10.5  O 

-10.4  O 

-27.0  O 

-19.8  O 

-25.7  O 

-17.5  O  - 

-23.2  O 

-15.0  O 

R82 

26.1  • 

28.8  • 

15.7  • 

20.0  • 

15.7  • 

18.4  • 

29.1  • 

31.8  • 

E204 

20.7  • 

20.3  • 

15.5  • 

17.7  • 

6.1  • 

8.5  • 

6.1  O 

8.5  O 

(b) 

BR 

L 

Mj 

m2 

RSB 

O 

15.2 

11.7 

12.4 

12.4 

D212 

O 

4.5 

8.1 

10.2 

13.7 

D85 

O 

4.4 

9.2[2.7] 

17.7 

21.1 

XI 

O 

0.9 

- 

-2.4 

0.7 

-0.1 
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by  5  units,  that  of  D2 1 2  by  3  units.  The  changed  hydrogen  bond  pattern  for  tyrosine 
Y57  reduces  its  pKa  by  5  units.  These  shifts  are  essentially  a  consequence  of  the 
overall  changes  in  the  binding  pocket  (dominant  changes  in  ApKback)  and  not  due 
to  the  shift  in  specific  pairwise  interactions. 

In  contrast  to  the  titration  behavior  of  BR,  which  was  determined  by  a  single, 
energetically  well-separated  neutral  ground  state,  the  L  intermediate  shows  a  more 
complex  pattern  (Fig.  5).  The  increased  pKa  of  D85  induces  proton  uptake  at  pH 
values  <4,  which  leads  to  the  admixture  of  a  second  L  population  (calculated  39% 
at  pH  3)  for  which  one  of  the  possible  proton  accepting  sites  in  the  release  pathway 
is  blocked.  The  situation  is  similar  to  the  purple-blue  transition  for  wild  type  BR 
and  confirms  that  bacteriorhodopsin  structures  with  al \-trans  and  13 -cis  retinal 
have  different  pKa’s  for  the  purple  to  blue  transition.  In  the  intermediate  pH  range 
from  4  to  10  proton  transfer  states  with  low  activation  energy  (A(jpt  1.3 kT) 
occur  (see  below)  concurrently  to  the  ground  state.  In  the  alkaline  pH  region  (>  10), 
the  pH  catalyzed  release  of  the  Schiff  base  proton  to  the  aqueous  medium  is  induced 
by  its  reduced  pKa.  The  drastic  reduction  of  the  pKa  for  tyrosine  Y57  favors  its 
deprotonation  and  the  formation  of  a  salt  bridge  to  D212.  On  the  other  hand,  the 
salt  bridge  formation  between  R82  and  D85  becomes  more  unfavorable  compared 
to  the  BR  structure.  These  findings  can  be  interpreted  in  terms  of  a  change  in  the 
protein  heterogeneities  and  show  that  these  inhomogeneities  can  occur  also  during 
the  photocycle  (photoinduced). 

The  titration  behavior  indicates  the  restriction  of  the  active  pH  range  for  proton 
pumping  to  the  pH  region  4  to  10,  as  was  found  experimentally  [25]. 

The  L  ->  M  Transition 

The  experimental  characterization  of  the  L  to  M  transition  is  complicated  by 
the  occurence  of  several  kinetically  distinguishable  apparent  M  species.  This 
prompted  the  proposition  of  different  kinetic  models:  (A)  The  biphasic  rise  and 
decay  may  reflect  separated  reaction  pathways  of  two  physically  different  compo¬ 
nents  with  independent  and  unidirectional  cycles  [4-7].  (B)  Besides  these  parallel 
cycle  models,  a  single-cycle  reaction  sequence  containing  reverse  reactions  and  two 
M  forms  is  used  to  fit  the  kinetic  data  (e.g.,  [8,25]). 

In  this  contribution,  we  trace  possible  proton  transfer  pathways  starting  from  the 
L  ground  state  species.  The  discussion  of  the  energetics  of  the  thermal  proton  transfer 
reactions  in  the  L  to  M  step  requires  the  calculation  of  the  pKa  values  for  donor 
and  acceptor  in  the  reactant  and  product  proton  configuration  (see  Methods).  A 
quick  overview  of  proton  transfer  facilities  for  different  protein  structures  is  possible 
through  the  construction  of  a  reference  proton  configuration,  where  the  protonation 
sites  (RSB,  D85,  D212,  and  XI)  are  unoccupied  [see  Table  111(b)].  This  simulates 
the  transfer  of  the  Schiff  base  proton  to  the  aqueous  phase  as  a  hypothetical  inter¬ 
mediate  step  in  a  thermodynamic  cycle  going  from  reactant  to  product  proton 
configuration.  For  the  BR  structure  the  smallest  calculated  pKa  difference  for  donor 
(rsb)  and  acceptor  (D85)  is  1 1  pH  units.  As  the  pKa  values  approach  each  other 
in  the  L  structure,  this  difference  is  reduced  to  2.5  units.  The  results  show  also  that 
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the  transfer  to  D212  in  the  L  structure  is  energetically  less  favorable  (3.6  pH  units). 
The  role  of  D85  as  catalytic  proton  release  binding  site  is  supported  by  much  self- 
consistent  experimental  evidence  [9,13,14].  On  the  other  hand,  mutations  of  the 
other  aspartate  (D212)  in  the  active  site  revealed  that  this  residue  is  necessary  for 
efficient  proton  release  [13,14].  Our  results  reveal  its  dominant  contribution  in 
adjusting  the  pKa  value  for  the  retinal  Schiff  base.  Presumably,  D212  plays  an 
essentially  structural  role.  It  may  be  protonated  in  the  second  half  of  the  photocycle 
[10],  but  little  could  be  said  definitely  about  its  involvement  in  proton  transfer. 

Our  results  indicate  the  appropriate  pKa  shift  for  donor  and  acceptor  residue 
(D85),  but  the  structure  reveals  that  the  distance  for  direct  proton  transfer  is  too 
large  (5.1  A).  Therefore,  another  residue  must  act  as  intermediate  proton  binding 
site.  Possible  transient  acceptors  are  the  tightly  bond  water  XI  and  the  protein 
backbone  peptide  group  (position  212,  helix  G )  next  to  the  Schiff  base  proton  (Fig. 
2).  Our  results  indicate  that  the  deprotonation  of  the  Schiff  base  is  initiated  by  the 
electrostatic  interaction  with  this  main  chain  oxygen.  The  measured  activation 
energy  for  the  L  -►  M  step  in  wild  type,  which  is  with  13.5  kcal/mol  [8]  much 
larger  than  is  expected  from  the  determined  pKa  difference  for  donor  and  acceptor 
[8,15]  of  0.6  pH  units  (=  AG#  «  1  kcal/mol)  points  also  to  the  involvement  of  an 
additional  kinetic  step. 

Starting  from  the  static  L  structure,  the  calculated  activation  energies  for  the 
transfer  of  the  retinal  Schiff  base  proton  to  water  XI  or  the  peptide  group  are  20 
kcal/mol  and  1 8  kcal/mol,  respectively.  For  both  product  states,  we  monitored  the 
relaxation  of  chromophore  and  protein  by  means  of  molecular  mechanics.  The 
subsequent  calculation  of  the  acid  base  equilibria  shows  that  the  proton  affinities 
for  the  (hypothetical  M)  structure  with  the  protonated  main  chain  peptide  group 
are  largely  unchanged  from  those  found  for  the  L  intermediate  with  a  slightly  favored 
transfer  to  aspartate  D85.  Even  though,  this  state  could  act  as  “virtual  intermediate” 
and  reduce  the  activation  barrier  for  following  transfer  reactions. 

On  the  other  hand,  the  protein  structural  reorientations  which  accompany  the 
transfer  of  the  proton  from  the  Schiff  base  to  water  X 1  change  the  proton  affinities 
profoundly.  The  pKa  values  for  hydronium  XI  and  D85  increase  drastically  to  a 
value  near  18  for  both  groups  in  the  reference  state.  The  equalized  affinities  for  XI 
and  D85  imply  that  this  structure  can  serve  as  barrierless  “switch  state”:  X 1  acts 
as  a  transient  proton  binding  site  and  catalyzes  the  preparation  of  the  release  pathway 
for  optimal  unidirectional  transport  to  D85.  If  the  pH  is  larger  than  pKa  (rsb,  4.1) 
the  proton  is  shared  between  XI  and  D85.  The  relative  distribution  is  a  function 
of  pH:  For  pH  >  6,  the  proton  is  predominantly  localized  on  XI  (e.g.,  for  pH  7, 
we  find  ( Xxi )  =  0.6 1  and  (xD85)  =  0.39),  for  pH  6  the  average  degree  of  protonation 
for  both  sites  is  0.5,  and  for  pH  <  6  the  proton  is  mainly  found  on  aspartate  D85 
(e.g.,  pH  3:  (xXi)  =  0.03  and  (jtD85)  =  0.97).  The  dissipation  of  20  kcal/mol  elec¬ 
trostatic  energy  (compared  to  L)  and  the  increased  activation  energy  (to  28  kcal/ 
mol)  for  the  back  reaction  ensure  the  irreversibility  of  the  transfer  step. 

The  M  Intermediate(s) 

During  the  lifetime  of  the  M  intermediates  with  deprotonated  retinal  Schiff  base 
two  kinetically  different  steps  occur:  One  is  a  “reorientation  step”  [8]  which  guar- 
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antees  that  the  reaction  proceeds  mainly  in  forward  direction;  the  other  step  involves 
pH-dependent  proton  release  to  the  extracellular  and  proton  uptake  from  the  cy¬ 
toplasmic  side  of  the  membrane  [24-26].  Both,  the  nature  of  the  “reorientation 
step”  and  its  location  in  the  photocycle  are  disputed.  One  class  of  models  [8,63] 
expects  two  M  intermediates  (Mi  and  M2),  one  with  the  Sch  iff  base  oriented  towards 
the  proton  release  pathway  and  one  oriented  towards  the  proton  uptake  pathway. 
Consequently,  the  L  to  Mi  step  is  connected  to  proton  release,  the  M2  to  N  step  is 
connected  to  proton  uptake.  The  other  group  of  models  [17,21,22,23]  argues  for 
reversible  changes  in  the  protein  after  M  formation,  which  contribute  to  the  re¬ 
structuring  of  the  central  proton  binding  site.  The  relative  sequence  of  protein 
reorientation  and  proton  release/uptake  step,  however,  is  unclear.  The  analysis  of 
the  pH  dependence  of  the  proton  release  reaction  argues  for  the  existence  of  an 
extracellular  proton  release  group  (XH,  [1,24])  whose  pKa  determines  the  pathway 
for  further  reactions:  At  a  pH  >  pKa(XH)  the  photocycle  proceeds  with  deproton¬ 
ation  of  the  release  group.  The  L  to  M  sequence  is  therefore  resolved  into  [25] 
either 


L(0)  ^  M,(0)  ^  Mi(-)  +  H+  (to  bulk)  ^  M2(-) 
or 

L(0)  ^  M,(0)  ^  M2(0)  ^  M2(-)  +  H+(to  bulk). 

The  symbols  in  parentheses  indicate  the  number  of  protons  compared  to  the  BR 
ground-state.  At  a  lower  pH  the  cycle  proceeds  without  deprotonation,  and  the 
proton  release  is  delayed  until  after  the  uptake  from  the  cytoplasmic  side. 

As  dichroism  measurements  [64]  give  no  hint  for  a  changed  orientation  of  the 
retinal  throughout  the  lifetime  of  the  M  intermediates,  the  reorientation  step  is 
likely  to  be  localized  in  the  protein.  The  necessity  of  a  protein  conformational 
change  for  proton  pumping  is  also  clearly  demonstrated  by  low  temperature  ex¬ 
periments  [23]:  Freezing  of  the  protein  results  in  a  reprotonation  of  rsb  from  D85; 
no  pumping  occurs. 

Mi :  Our  analysis  of  possible  L  -►  M  steps  reveals  that  the  transient  binding  of 
the  Schiff  base  proton  on  water  XI  leads  to  the  first  stable  M  intermediate  (named 
Mi;  see  Fig.  3)  where  the  proton  is  localized  on  aspartate  D85.  We  followed  the 
accompanying  reorientation  of  chromophore  and  protein  by  means  of  molecular 
mechanics  and  electrostatic  calculations.  Compared  to  the  preceeding  step,  no  fur¬ 
ther  reduction  in  ground-state  free  energy  occurs.  The  strong  interaction  to  the 
backbone,  which  we  found  for  the  L  intermediate,  is  relaxed.  The  positive  charged 
arginine  R82  moves  further  away  from  the  active  site  towards  the  extracellular  side 
of  the  membrane — a  motion  which  was  initiated  by  the  positive  charged  hydronium 
in  the  preceeding  step.  Increased  electrostatic  interactions  strengthen  the  connection 
between  D85,  the  tightly  bound  waters  (XI,  X2)  and  threonine  T89  in  the  possible 
proton  uptake  channel. 

These  changed  electrostatic  interactions  lead  to  shifts  in  the  pKa  values.  Compared 
to  the  L  or  BR  structure  we  find  the  situation  now  inverted:  D85  is  the  high  pKa 
group  which  is  always  protonated,  the  RSB  is  the  low  pKa  residue  (6.2).  In  order  to 
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Figure  3.  M,  intermediate — stereoview  of  retinal  and  functionally  important  residues  in 
the  proton  release  pathway. 


discuss  the  titration  behavior  of  the  individual  sites  for  this  intermediate,  we  analyze 
the  lowest  energy  proton  configurations  as  a  function  of  pH  [Table  111(a)] .  For  the 
pH  region  above  the  pKa  (RSB,  6.2)  the  Mi  intermediate  with  deprotonated  RSB 
will  be  accumulated.  If  the  pH  is  below  this  pKa,  the  charge  in  the  protein  is 
determined  by  the  uptake  of  a  proton  by  the  Schiffbase  from  the  aqueous  medium 
(Fig.  5). 

We  investigated  also  the  possible  involvement  of  one  of  the  tightly  bound  waters 
in  the  binding  pocket  as  internal  donor  for  the  reprotonation  of  RSB.  We  find  that 
at  least  20  kcal/mol  activation  energy  ate  necessary  for  proton  transfer  from  X2  to 
RSB.  This  value  is  considerably  lower  for  the  Mi  structure  compared  to  L. 

The  Mj-M2  Step:  In  [24],  the  hydrogen  bonded  complex  of  R82,  Y57,  and  a 
bound  water  or  hydronium  ion  is  discussed  as  possible  candidate  for  the  proton 
release  (XH)  group.  Our  results  give  no  hint  for  a  favorable  deprotonation  of  one 
of  these  in  our  Mi  structure.  On  the  other  hand,  it  was  found  that  arginine  R82  is 
an  essential  residue  which  catalyzes  the  proton  release  [17].  We  searched,  therefore, 
for  a  residue  located  near  the  extracellular  side  with  a  high  enough  pKa  to  be 
protonated  in  the  initial  state  and  whose  pKa  can  be  influenced  by  the  positive 
charge  on  R82.  Our  candidate  for  the  proton  release  group  is  glutamate  E204!  The 
additional  Mj(0)  ^  M2(0)  step,  which  precedes  the  proton  release,  is,  therefore,  a 
reorientation  of  R82  from  the  protein  interior  (near  the  SchifF  base)  towards  the 
aqueous  interface.  In  the  structure  both  orientations  of  R82  are  possible  [18].  pKa 
calculations  [32]  and  experiments  with  R82  mutants  [15]  suggest  that  in  the  initial 
state  the  geometry  that  orients  R82  towards  the  Schiffbase  and  D85  is  more  likely. 
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A  twist  of  the  three  single  bonds  in  the  arginine  residue  needs  ca.  10  kcal/mol 
activation  energy  and  brings  R82  quite  close  to  glutamate  E204.  As  there  is  no 
direct  counterion  near  this  glutamate  residue,  its  initial  high  pKa  is  comparable  to 
the  one  calculated  for  D96.  A  pKa  increase  in  the  BR  ground  state  for  E204  was 
also  found  in  another  theoretical  investigation  [32].  The  reorientation  of  the  positive 
charge  of  R82  stabilizes  the  negative  charge  on  E204.  The  motion  of  R82  during 
the  photocycle  was  proposed  earlier  [10,17,54].  Large  protein  conformational 
changes  during  the  lifetime  of  M  were  found  experimentally  [59].  The  strong  pH 
dependence  of  the  M2  electrogenicity  indicates  [65]  that  the  Mi  to  M2  transition 
involves  complex  charge  motions,  as  is  expected  in  the  proposed  conformational 
change  of  the  protein.  The  experimental  Mi  -►  M2  rate  constant  (2  104  s"1  [8,25]) 
is  in  the  upper  limit  for  large  conformational  transitions  in  the  protein. 

M2:  Figure  4  shows  the  equilibrated  protein  structure  after  reorientation  of  arginine 
R82  from  its  location  near  the  active  site  to  the  extracellular  side  of  the  membrane 
(M2  intermediate).  The  most  prominent  change  in  the  electrostatic  interactions  is 
a  reduction  of  the  R82-D85  interaction  by  10  pH  units  and  the  concomitant  increase 
of  the  R82-E204  coupling  by  the  same  amount.  The  interaction  between  the  Schiff 
base  and  XI  is  enlarged  by  2  pH  units;  the  strong  interaction  between  D85,  XI, 
X2,  and  T89  is  maintained. 

The  titration  behavior  is  determined  by  four  energetically  close-lying  proton 
configurations,  whose  relative  contribution  is  a  function  of  the  pH  and  accounts 
for  the  complex  uptake/release  pattern  [Fig.  5  and  Table  111(a)] .  The  differences  in 


Figure  4.  M2  intermediate — stereoview  of  retinal  and  functionally  important  residues  in 
the  proton  release  pathway. 
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pH 

Figure  5.  Calculated  total  charge  for  a  Boltzmann  weighted  distribution  of  ionization 
states  for  titratable  residues  in  the  L,  Mb  and  M2  intermediate  (reference  BR  =  0). 


protonation  of  the  sites  lead  to  multiple  pKa  values  for  the  residues.  As  discussed 
earlier,  the  pKa  shifts  are  proportional  to  the  interaction  energy  values  Wy.  For 
pH  <  pKa  (RSB,  6.0),  neither  the  SchifFbase  nor  the  glutamate  E204  will  deprotonate. 
Therefore,  we  find  a  net  proton  uptake  compare  to  the  L  or  BR  structure.  In  an 
intermediate  pH  region  ranging  from  pKa(RSB,  6.0)  to  pKa(E204,  8.5)  two  almost 
isoenergetic  proton  configurations  determine  the  ground  state.  In  both  of  them, 
the  protein  remains  neutral.  The  proton  is  localized  either  on  the  Schiff  base  or  on 
the  glutamate  E204  residue.  The  deprotonation  of  E204  rises  the  pKa  of  the  RSB 
by  2.5  units  to  pKa(RSB)  =  8.5.  If  the  pH  is  increased  to  a  value  above  8.5,  the 
Schiff  base  and  E204  deprotonate  and  a  net  deficit  of  one  proton  in  the  protein 
occurs. 

The  results  show  that  the  postswitch  arrangement  of  the  protein  is  a  prereq¬ 
uisite  for  the  deprotonation  of  E204,  the  proton  release  group.  The  proton  con¬ 
figuration  with  net  proton  uptake  can  be  identified  with  the  next  intermediate 
(N)  in  the  photocycle.  Our  calculations  propose  reprotonation  of  the  retinal 
Schiff  base  from  the  aqueous  medium,  in  contrast  to  the  experimentally  found 
reprotonation  from  a  residuum  located  in  the  cytoplasmic  uptake  channel  (e.g., 
D96).  However,  the  consistent  modeling  of  the  intramolecular  proton  transfer 
in  the  uptake  phase  requires  several  further  steps  in  our  calculation,  e.g.,  the 
explicit  generation  of  the  equilibrated  protein  structures  for  M2(— )  and  M2(+) 
as  well  as  the  improvement  of  the  description  of  the  solvation  for  the  water- 
exposed  residues. 
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Conclusions 

Starting  from  the  medium  resolution  structure  of  Henderson,  this  investigation 
presents  a  detailed  model  for  the  molecular  mechanisms  which  lead  to  proton 
release  in  the  first  half  of  the  bacteriorhodopsin  photocycle.  The  essential  features 
of  the  intermediates  (structure,  hydration,  titration,  spectroscopy)  turn  out  to  be 
consistent  with  experimental  results.  Especially,  the  discussion  of  the  pairwise  elec¬ 
trostatic  interactions  in  the  BR  ground  state  gives  insight  into  the  individual  con¬ 
tributions  to  pKa  shifts  and  is  in  agreement  with  mutation  experiments. 

Isomerization  induced  protein  conformational  changes  shift  the  acid  base  equi¬ 
libria  of  chromophore  and  key  residues.  These  pKa  shifts  are  used  as  guidance  for 
the  discussion  of  possible  proton  transfer  pathways.  We  propose  the  following  se¬ 
quence  for  the  L  to  M  step: 


+H+ 

^  RSBH+,  D85-H,  E204-H 

(pH  <  6.0) 

L  -  MT  -  M,  -  M2  *  /RSBH+’  D85'H’  E204_ 

\rSB,  D85-H,  E204-H 

(6.0  <  pH  <  8.5) 

^  RSB,  D85-H,  E204 

(pH  >  8.5) 

— H+ 

Mf  ^  M,: 
M,  ^  M2: 


M2  - 


The  deprotonation  of  the  Schilf  base  is  initiated  by  the  electrostatic 
interaction  to  the  main  chain  oxygen  (position  D2 12,  helix  G).  A  tightly 
bound  water  molecule  (XI  in  Fig.  1)  acts  as  transient  proton  binding 
site;  free  energy  is  dissipated  to  the  protein  (irreversible  switch  step). 
Proton  transfer  from  XI  to  aspartate  D85. 

Arginine  R82  moves  towards  the  extracellular  side  and  induces  a  pKa 
reduction  of  the  proton  release  group,  identified  as  glutamate  E204 
(protein  reorientation  step). 
pH-dependent  proton  release  and  uptake. 


Besides  this  linear  sequence  including  back  reactions,  the  existence  of  parallel  pho¬ 
tocycles  starting  from  different  BR  species  cannot  be  excluded.  Our  results  point 
to  pH-dependent  charge  heterogeneities,  which  vary  during  the  photocycle. 
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Abstract 

Triglycerides,  the  major  components  of  neutral  lipids,  are  important  biomaterials,  as  they  take  part 
in  the  edification  of  membranes.  In  this  perspective,  the  consideration  of  biological  membranes  at  a 
molecular  level  requires  detailed  knowledge  of  the  preferred  conformations  of  the  triglycerides  in  their 
various  polymorphic  forms.  In  this  context,  we  adapted  a  molecular  modeling  approach,  which  allows 
the  simulation  the  three-dimensional  structure  of  the  different  polymorphic  forms  (a,  /?',  and  /?)  valid 
for  any  triglyceride.  Their  conformational  analysis  is  based  on  molecular  mechanics  calculations,  as 
follows:  First,  a  large  number  of  isolated  molecular  structures  were  generated  in  a  systematic  structure- 
tree  analysis.  For  their  generation,  atomic  charges  within  the  Mulliken  scheme,  calculated  at  the  ab  initio 
rhf-lcaO-mo-SCF  level  (6-31G),  were  considered.  The  lowest-energy  conformers  were,  next,  correlated 
with  experimental  data  (nmr,  powder  X-ray  diffraction)  in  order  to  select  a,  /S',  or  jS  structures.  Then, 
in  a  second  step,  these  selected  conformers  were  assembled,  in  head-to-tail  dimers  in  order  to  form  a 
monolayer.  For  this  step,  the  use  of  potential-derived  atomic  charges  is  known  to  be  more  suitable.  In 
this  study,  we  consider  triglycerides  derived  from  the  predominant  fatty  acids,  i.e.,  stearic,  elaidic,  and 
oleic  acids.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Triglycerides,  triesters  of  glycerol  and  fatty  acids,  are  important  raw  materials. 
Being  the  main  components  of  natural  fats,  they  are  widely  used  in  the  food  industry 
in-  However,  they  also  fulfill  important  biochemical  functions  (edification  of 
membranes,  transport  of  fats,  etc.)  [2-5].  The  nature  of  the  fatty  acids,  saturated 
or  not,  as  well  as  their  position  on  the  glycerol  backbone,  generates  a  large  number 
of  triglyceridic  structures:  monoacid,  diacid,  asymmetric,  symmetric,  mixed  satu¬ 
rated  and  unsaturated,  etc.  Furthermore,  triglycerides  may  exist  in  various  poly¬ 
morphic  forms,  a ,  ft',  or  /?,  according  to  different  lateral  crystal  packing.  The  a 
form  structure  is  similar  to  the  melted  state,  where  hydrocarbon  chains  are  in  free 
rotation  and  presents  a  tuning-fork  conformation  [Fig.  1  (a)].  The  (3 '  and  forms 
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a)  a  b)  p'  ,  p 

Figure  1.  Schematization  of  the  polymorphic  (a)  a  form:  tuning-fork  and  (b)  j8'  and 

forms:  chair. 


present  a  chair  structure  [Fig.  1(b)],  in  which  the  chains  are  tilted.  The  presence 
of  different  types  of  unsaturation  influences  the  existence  and  the  stability  of  the 
polymorphic  forms  [6,7].  Moreover,  triglycerides  may  also  pack  in  different  lon¬ 
gitudinal  arrangements,  i.e.,  in  double  (L-2)  or  triple  (L-3)  hydrocarbon  chain 
lengths  (Fig.  2).  For  example,  monoacid  triglycerides  crystallize  in  a  double  hy¬ 
drocarbon  chain  length,  whereas  cis  unsaturated  diacid  compounds  pack  in  a  triple 
hydrocarbon  chain  length. 

The  diversity  of  such  molecules  as  well  as  their  associated  polymorphism  confer 
to  these  lipids  complex  structural  behavior,  which  is  related  to  particular  physi¬ 
co-chemical  properties  such  as  fluidity  or  biochemical  activity  [  8  ] .  It  is  thus  essential 
to  have  a  thorough  knowledge  of  the  structure  and  molecular  organization  of  the 
different  polymorphic  forms  in  order  to  understand  or  even  predict  some  particular 
membrane  characteristics.  Unfortunately,  little  structural  information,  coming  from 
monocrystal  X-ray  diffraction  studies,  is  available  [9-12],  and,  moreover,  the 
method  can  be  applied  only  to  stable  polymorphic  forms. 

For  these  reasons,  we  wished  to  develop  a  molecular  modeling  approach,  to 
simulate  the  three-dimensional  structure  of  the  a,  or  (3  crystalline  forms  of  any 


Short 


Figure  2.  Schematization  of  the  longitudinal  packing  according  to  (a)  a  double  L-2  and 
(b)  a  triple  L-3  hydrocarbon  chain  length. 
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triglyceride.  Our  attention  was,  therefore,  directed  toward  a  molecular  mechanics 
approach  already  described  in  the  literature  and  successfully  applied  to  phospholipid 
membranes,  lipid  helices,  drugs,  and  peptides  [13]. 

In  this  study,  we  focus  on  the  particular  structural  behavior  of  monoacid  tri¬ 
glycerides  derived  from  stearic,  elaidic,  and  oleic  acids  (the  most  abundant  fatty 
acids),  i.e.,  tristearin  (which  is  saturated),  trielaidin  {trans  unsaturated),  and  triolein 
( cis  unsaturated)  (Fig.  3). 

The  article  is  organized  as  follows:  In  the  first  part,  the  strategy  concerning  the 
conformational  analysis  of  isolated  conformers  as  well  as  their  assembly  is  described, 
by  considering  trilaurin,  a  CJ2  triglyceride  whose  three-dimensional  structure  is 
available  [10].  In  this  step,  particular  focus  is  put  on  the  calculation  of  the  atomic 
charges,  needed  in  the  electrostatic  contribution. 

In  a  second  part,  the  procedure  defined  is  applied  to  tristearin,  trielaidin,  and 
the  triolein  model:  three  Qg  triglycerides.  As  for  trilaurin,  the  three-dimensional 
structure  of  isolated  compounds  is  analyzed  first.  Particular  conformations  are  then 
retained,  by  correlation  with  experimental  data  (13C-mas-CP-nmr,  powder  X-ray 
diffraction).  These  conformers  are  then  assembled,  in  head-to-tail  bimolecular  con¬ 
formations,  whose  stability  is  also  correlated  with  experimental  values  (differential 
scanning  calorimetry ) .  Finally,  the  lowest-energy  dimers  will  be  associated  to  form 
a  monolayer,  approaching,  thereby,  the  three-dimensional  packing  of  the  different 
polymorphic  forms. 


CH2-O-CO-R 

I 

CH  -O-CO-R 
CH2-0-C0-R 


Radical  (R) 

Trilaurin 

Cl2 

-(CH2)10-CH3 

Tristearin 

C18 

-(CH2)16-CH3 

Triolein 

Cl8 

one  cis  double  bond 

-(CH2)7-CH=CH-(CH2)7-CH3 

Trielaidin 

Cl8 

one  trans  double  bond 

-(CH2)7-CH=CH-(CH2)7-CH3 

Reduced  model 


CH2-0-C0-(CH2)2-CH3 

(I) 

II 

CH  -0-C0-(CH2)4-CH3 

(II) 

“1  I 

CH2-0-C0-(CH2)2-CH3 

(III) 

III 

Figure  3.  Structure  formulas  of  the  considered  triglycerides. 
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Methods  and  Strategy 

The  conformational  analysis  method  chosen,  widely  applied  to  lipid  and  phos¬ 
pholipid  membranes,  is  based  on  molecular  mechanics  calculations  and  involves 
a  three-step  procedure  [13  ] :  first,  the  generation  of  the  three-dimensional  structure 
of  an  isolated  molecule;  second,  the  assembly  of  two  molecules  to  form  a  dimer; 
and  third,  the  formation  of  a  monolayer,  by  association  of  dimers. 

Conformational  Analysis  of  the  Isolated  Molecule 

For  the  generation  of  the  isolated  models  to  be  used  later  for  the  assemblies,  the 
procedure  is  in  two  steps:  First,  a  structure-tree  analysis  to  generate  a  certain  number 
of  conformers  of  high  probability  (low  energy)  and,  second,  an  energy  minimization 
of  the  obtained  high  probability  conformers.  For  both  parts,  one  needs  thus  to  use 
a  potential  function.  The  total  conformational  energy  is  considered  as  a  sum  of 
three  terms: 

1 .  A  van  der  Waals  energy  contribution  defined  by  a  Buckingham’s  pairwise  func¬ 
tion  between  all  pairs  of  nonbonded  atoms  i  and  j: 

£vdw  =  2  -  %  ,  (1) 

ij  r‘J 

where  rtJ  is  the  distance  between  all  atoms,  and  Ay,  By,  and  C„  are  coefficients 
assigned  to  atom  pairs.  The  values  chosen  for  the  constants  are  those  proposed 
by  Liquori  and  co-workers  [14,15]. 

2.  An  electrostatic  energy  contribution  described  by  a  Coulombic  interaction  term 
between  atomic  point  charges: 

Eelec=  332  2  —,  (2) 

ij  eUrU 

where  ey  is  the  dielectric  constant,  and  qt  and  <?y,  the  atomic  charges  calculated 
as  described  later.  To  simulate  biomembranes,  mainly  constituted  by  phospho¬ 
lipids  of  16  or  18  carbon  atoms  (mean  length  of  20  A),  one  has  to  consider, 
depending  on  the  distance  between  each  pair  of  atoms,  a  linear  variation  of  ey 
from  1  to  16  (e  =  16  is  the  value  for  a  phosphate  group)  up  to  a  distance  of  20 
A.  Ee/ec  is  given  in  kcal/mol  when  ry  is  expressed  in  A  and  q(  and  qj  in  electron 
charge  units  (332  is  a  conversion  factor,  allowing  the  expression  of  the  energy 
in  kcal/mol). 

3.  A  torsional  energy  contribution  defined  as 

Eton  =  2  -y  (  1  +  COS  4>,j),  (3) 

ij 

where  $y  corresponds  to  the  various  torsional  angles  around  the  C — C  and 
C — O  single  bonds,  and  Uy,  to  the  energy  barrier  between  eclipsed  and  staggered 
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conformations.  Uy  is  set  to  2.8  kcal/mol  for  a  C — C  bond  and  1.8  kcal/mol  for 

aC— O  bond  [16]. 

Starting  from  an  all -trans  geometry,  and  considering  standard  values  for  the 
interatomic  distances  and  bond  angles  [16],  a  large  number  of  conformers  is  gen¬ 
erated  by  a  structure-tree  analysis  [17].  If  systematic  60°  changes  are  applied  for 
n  selected  torsional  angles,  6 "  conformers  are  generated.  For  each  conformer,  the 
total  energy  is  computed  as  well  as  the  statistical  weight  (according  to  the  Maxwell- 
Boltzmann  equation ) . 

The  sheets  of  the  trees,  i.e.,  the  structures  having  a  statistical  weight  above  1%, 
are  then  submitted  to  a  simplex  minimization  procedure,  with  a  precision  of  5° 
for  each  torsional  angle  [  1 8  ] .  At  the  end  of  the  minimization,  the  probability  of 
existence  is  associated  to  each  of  the  three-dimensional  conformers  generated. 

Assembly  of  Monolayers 

For  the  assembly  of  dimers,  as  well  as  that  for  monolayers,  the  total  energy  is 
calculated  as  the  sum  of  three  terms:  a  van  der  Waals  contribution  (Evd w)  and  the 
electrostatic  energy  (Ee/cc)  as  defined  above,  plus  a  transfer  energy  contribution 
(Etr).  No  torsional  term  is  considered  here  as  the  models  will  be  kept  rigid.  The 
transfer  energy  term  is  defined  for  a  molecule  as  the  sum  of  all  transfer  energy 
changes  associated  with  the  transfer  of  each  individual  atom  from  the  hydrophilic 
to  the  hydrophobic  phase  [13] .  These  values  of  transfer  energy  per  atom  have  been 
derived  from  total  transfer  energies  compiled  for  a  series  of  chemical  analogs  [19] 
and  are  equal  to  —1.5  kcal/mol  for  a  Csp2  atom,  -2.4  kcal/mol  for  a  Csp3  atom, 
1.0  kcal/mol  for  H,  and  2.8  kcal/mol  for  O  [20]. 

In  the  first  step,  to  mimic  the  crystalline  structure  in  which  molecules  are  oriented 
in  opposite  directions,  the  conformers  are  assembled  in  a  head-to-tail  bimolecular 
configuration.  One  molecule,  previously  rotated  from  180°,  is  moved  toward  an¬ 
other,  taking  into  account  steric  constraints.  The  system  is  then  submitted  to  an 
interaction  energy  minimization  procedure,  by  translation  (step  of  0.5  A)  and  ro¬ 
tation  (step  of  2.5°)  along  and  about  the  three  orthogonal  axes.  Starting  from 
different  bimolecular  approaches,  this  sequential  assembly  can  lead  to  different 
dimer  geometries,  all  located  within  a  range  of  energies  of  only  ±5  kcal/mol.  In 
our  case,  only  the  lowest-energy  dimers  are  retained.  In  a  second  step,  using  the 
same  assembly  procedure,  dimers  are  brought  toward  each  other  and  their  inter¬ 
action  energy  is  minimized.  The  procedure  is  then  repeated  until  a  monolayer  of 
16  molecules  (8  dimers)  is  formed. 

The  generation  of  both  the  isolated  forms  and  monolayers  have  been  performed 
using  the  PC-Molecular  and  PC-TAMMO+  (Theoretical  Analysis  of  Membrane 
Molecular  Organization)  procedures  [13],  on  a  Olivetti  CP486  microcomputer 
equipped  with  an  Intel  80486  processor.  Graphics  have  been  drawn  with  the  PC- 
MGM+  (Molecular  Graphics  Manipulation)  program  [13]. 

Calculation  of  Atomic  Charges 

If  it  is  usually  accepted  that  the  bonding  terms  are  satisfactorily  well  described 
by  various  energy  functions  (such  as  the  one  proposed  for  the  van  der  Waals  or 
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dihedral  contribution)  for  biological  macromolecules,  there  is,  however,  more  un¬ 
certainty  regarding  the  electrostatic  contributions.  Due  to  their  slow  variation  with 
distance,  the  best  possible  treatment  of  the  Coulomb  term  is  essential,  especially 
for  the  evaluation  of  conformational  equilibria  and  thermodynamics  properties.  In 
our  case,  as  the  electrostatic  term  may  contribute  to  a  large  part  of  the  total  energy, 
a  most  precise  evaluation  of  the  atomic  charges  is  desirable.  More  precisely,  it  has 
already  been  shown  in  a  previous  work  dedicated  to  the  comparison  of  atomic 
charges  to  be  integrated  into  conformational  analyses  of  neutral  lipids  that  charges 
generated  by  semiempirical  methods  such  as  AMl  were  not  suitable  [21].  Hence, 
the  charges  for  our  model  have  been  calculated  at  the  ab  initio  RHF-LCAO-MO-SCF 
method  [22],  with  the  GAUSSIAN  92  series  of  program  [23].  Using  the  6-3 1G 
basis  set,  the  atomic  charges  were  obtained  both  from  the  Mulliken  approach  [24] 
and  also  derived  from  the  molecular  electrostatic  potential,  proposed  by  Kollman 
[25,26] .  Their  respective  application  as  well  as  the  choice  of  the  basis  set  is  detailed 
in  the  discussion. 

All  ab  initio  MO  computations  were  performed  on  a  cluster  of  IBM  RISC  6000 
Model  560  computers  of  the  Scientific  Computing  Facility  Center  (Namur-SCF) 
of  the  University  of  Namur. 


Results  and  Discussion 

Atomic  Point  Charge  Evaluation 

First,  several  test  calculations  were  performed  on  reduced  models  of  trilaurin 
(i.e.,  the  glycerol  moiety  with  3  and  5  methylene  groups  for  chain  II  and  chains  I 
and  III,  respectively  [Fig.  3])  considered  in  its  crystalline  state  [9].  The  hydrogens 
were  positioned  with  standard  distances,  bond  angles,  and  torsional  angles  depending 
on  the  carrier  atom.  The  atomic  charges  were  derived  from  ca.  2500  molecular 
electrostatic  potential  values  considered  around  the  molecule  with  three  different 
basis  sets:  6-3 1G,  6-3 1G*,  and  6-3 1G**  (*  =  +  d  polarization  functions  on  heavy 
atoms  and  **  =  +d  functions  on  heavy  atoms  and  p  functions  on  hydrogens, 
resulting  in  280,  424,  and  520  basis  functions,  respectively).  The  maximum  dif¬ 
ferences  between  the  charges  obtained  with  the  two  basis  including  polarization 
functions  is  very  small,  only  ±0.04  e~ ,  whereas  the  largest  difference  between  the 
values  obtained  with  6-3 1G  and  6-3 1G*  can  reach  ±0.2  e~  in  the  case  of  the 
carbonyl  atoms.  Bearing  those  differences  in  mind,  the  6-3 1G  basis  set  was  chosen 
for  the  further  computations  as  a  cost-quality  compromise.  Within  the  6-3 1G  basis 
set,  the  number  of  basis  functions  is  535,  775,  775,  and  787  for  trilaurin,  trielaidin, 
triolein,  and  tristearin,  respectively.  The  calculations  were  done  using  the  direct 
SCF  approach  with  the  standard  bielectron  integral  and  convergence  thresholds  for 
single-point  calculations. 

For  the  structure-tree  generation,  since  the  torsional  angles  of  the  molecule  are 
continuously  varying,  the  use  of  point-derived  charges  is  inadequate  as  they  are 
extremely  dependent  upon  the  overall  conformation.  Hence,  charges  obtained  with 
the  Mulliken  population  analysis  were  taken  into  account  in  the  electrostatic  con¬ 
tribution  when  generating  the  three-dimensional  structure  of  the  isolated  model. 
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More  particularly,  one  considered  the  Mulliken  atomic  charge  values  of  the  glycerol 
region  and  the  terminal  carbons  for  the  trilaurin  and  tristearin  models  and,  when 
needed,  appropriate  charges  around  the  double  bonds  for  trielaidin  and  triolein 
(Fig.  4). 

However,  for  the  assembly  of  dimers  and  monolayers,  since  the  total  conformation 
of  the  molecule  is  then  kept  rigid  (only  translation  and  rotation  are  taken  into 
account),  the  use  of  charges  derived  from  the  surrounding  potential,  and  thus  more 
sensitive  to  the  global  molecular  conformation,  can  be  considered.  It  is  important 
to  stress  that  their  use  has  been  recommended  when  considering  intermolecular 
interactions  [25,26].  Hence,  potential-derived  charges  were  calculated  for  each  of 
the  conformers  that  will  be  selected  for  the  assembly. 
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Conformation  Analysis  Strategy 

To  adapt  the  procedure  to  neutral  lipids,  a  detailed  study  of  the  structure  of 
trilaurin,  a  monoacid  C12  triglyceride  for  which  the  crystalline  structure  is  well 
known,  was  undertaken  [7].  Besides  the  test  calculations  regarding  the  atomic 
charge  values  (basis  set  and  method),  first,  one  considered,  in  the  minimization 
step,  different  values  of  the  dielectric  constant  (as  mentioned  earlier,  e  was  allowed 
to  vary  in  the  structure-tree  generation  part).  With  increasing  discrete  values  of  e 
(1,3,  16,  and  80),  the  contribution  of  the  electrostatic  term  logically  decreased  but 
the  statistical  repartition  of  the  different  conformers  remained  unchanged.  Thus, 
a  value  of  e  =  1  was  chosen  for  the  further  calculations.  Second,  three  different 
structure-tree  strategies,  involving  torsional  twists  indicated  along  the  three  [tree 
no.  1,  Fig.  5(a)]  or  two  [trees  nos.  2  and  3,  Fig.  5(b)]  hydrocarbon  chains  were 
tested.  For  structure-tree  no.  2,  twists  around  chain  II  were  not  considered  in  order 
to  keep  the  chain  in  the  opposite  direction  to  chains  I  and  III.  For  structure-tree 
no.  3,  twists  around  chain  III  were  not  allowed  to  force  the  parallelism  between 
chains  I  and  III.  The  most  stable  conformers  generated  by  structure-tree  no.  1  are 
mostly  extended  conformations  [Fig.  6(a)]  and,  in  a  minor  portion,  folded  con¬ 
formations  [Fig.  6(b)].  The  most  stable  conformers  generated  by  structure-trees 
nos.  2  and  3  are  mostly  folded  and  the  extended  ones  present  a  tuning-fork  con¬ 
formation,  representative  of  the  a  form  only.  As  only  extended  conformations  are 
able  to  fit  with  the  experimentally  observed  a,  /?',  or  /3  form,  the  structure-tree 
analysis  considering  variations  of  the  torsional  angles  along  the  three  chains  is 
preferred.  The  search  method  has  been  further  optimized  in  two  levels  to  avoid  an 
excessive  time-consuming  procedure.  The  variation  of  the  first  eight  torsional  angles 


Figure  5.  Torsional  angles  along  (a)  three  and  (b)  two  hydrocarbon  chains,  considered 
for  the  different  structure-tree  analyses. 
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Figure  6.  Classification  of  the  modelized  conformers  into  (a)  extended  structure  and  (b) 

folded  structure. 


[1-8,  Fig.  5(a)]  generates  6 8  conformers,  with  those  having  a  statistical  weight 
above  3%  being  kept.  A  second  variation  of  angles  6-1 1  generates  6 6  conformers, 
and  those  having  a  probability  of  existence  larger  than  1%  retained.  The  retained 
models  need  then  to  be  classified  and  particular  ones  have  to  be  selected  to  represent 
particular  polymorphic  forms.  Therefore,  the  most  probable  modelized  conformers 
are  next  divided  in  extended  [Fig.  6(a)]  and  folded  [Fig.  6(b)]  structures,  which 
are  finally  correlated  with  experimental  data. 

More  precisely,  hypotheses  relative  to  the  three-dimensional  structure  can  be 
deduced  by  13C  high  resolution  solid-state  nmr  (mas-Cp)  and,  more  particularly, 
by  the  analysis  of  the  chemical  shifts  for  the  carbonyl  and  the  glyceridic  carbons 
[27]: 

•  For  the  a  form,  the  spectra  show  two  peaks  (for  chains  I  and  III,  and  chain  II, 
respectively) ,  possessing  a  symmetrical  environment  or  a  tuning-fork  structure 
[Fig.  1(a)]. 

•  For  the  /?'  and  (3  forms,  one  observes  three  peaks:  the  (3 '  and  (3  forms  presenting 
an  asymmetrical  or  chair  structure  [Fig.  1(b)].  According  to  the  nmr  data 
for  trilaurin,  the  extended  conformers  obtained  can  thus  be  classified  in  a  form 
[Fig.  7(a)]  or  and  (3  forms  [Fig.  7(b)]. 

From  powder  X-ray  diffraction,  measurements  of  the  distances  between  chain- 
end  carbons,  i.e.,  the  long  spacings  (Fig.  2),  can  be  determined,  for  each  triglyceride 
under  their  various  polymorphic  forms  (Table  I).  Hence,  by  comparison  of  the 
distances  between  the  chain  ends  for  the  different  conformers  and  the  values  of 
long  spacings  for  each  polymorphic  form,  particular  conformers  that  are  able  to 
mimic  a ,  /?',  and  (3  forms  were  selected. 

The  selected  conformers  can  then  be  assembled  into  monolayers.  To  approximate 
crystalline  lattices,  in  which  triglycerides  are  packed  in  opposite  directions,  two 
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Figure  7.  Classification  of  the  eight  more  probable  modelized  conformers  of  trilaurin  in 
(a)  a-forms  (nos.  1-4)  and  (b)  0'  or  0  forms  (nos.  5-8). 


molecules  are  first  assembled  sequentially  in  a  head-to-tail  bimolecular  confor¬ 
mation,  taking  into  account  steric  constraints,  and  their  interaction  energy  mini¬ 
mized  as  described  earlier.  The  lowest-energy  dimers  are  then  repeated  to  form  a 
monolayer. 

The  whole  conformational  analysis  procedure  proposed  is  summarized  in  Fig¬ 
ure  8. 


Monoacid  Triglyceride  Isolated  Conformers 

The  procedure,  illustrated  in  Figure  8,  has  also  been  applied  to  C18  monoacid 
triglycerides,  i.e.,  tristearin,  trielaidin,  and  triolein.  The  results  of  the  structure-tree 
analyses  are  presented  in  Figure  9.  At  the  first  level  (variation  of  the  angles  1-8), 
conformers  having  a  statistical  weight  larger  than  3%  were  kept:  six  for  tristearin, 
five  for  trielaidin,  and  seven  for  triolein.  Starting  from  these  conformations,  one 
obtains  at  the  level  2  (variation  of  the  angles  6-11)  12,  14,  and  12  low-energy 
structures  for  tristearin,  trielaidin,  and  triolein,  respectively.  As  explained  earlier, 
these  conformers  are  then  minimized.  The  most  important  are  presented  in  Figure 
10.  All  probable  conformers  for  triolein  have  a  folded  structure  (Fig.  10),  evidently 


Table  I.  Values  of  the  long  spacings  (A)  observed  for  the 
polymorphic  forms  of  trilaurin,  tristearin,  and  trielaidin. 


a  Form  0'  Form  0  Form 


Trilaurin 

Tristearin 

Trielaidin 


34 

53 

51 


3 

5 


caused  by  the  presence  of  a  cis  double  bond  in  all  three  chains.  This  structure  has 
been  shown  to  exist  in  oil-water/ air-water  interfaces  or  inside  membranar  phos¬ 
pholipids  [28  ] ,  but  does  not  resemble  any  crystalline  form  reported  so  far.  Therefore, 
as  for  the  saturated  trilaurin,  one  will  focus  on  the  tristearin  and  trielaidin  models 
that  produce  extended  structures  comparable  to  the  crystalline  forms  observed  ex- 
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1. 


2. 


3. 

4. 


1  2  3  4  5  6  7  8  9  10  11  12 

0.00  1.24  0.00  1.22  0.00  0.79  0.00  0.80  0.00  0.69  1.03  0.00 


l  2  3  4  5  6  7  8  9  10  11  12  13  14 

0.00  1.24  1.42  0.00  1.22  1.34  0.00  0.79  0.84  0.00  0.80  0.99  0.00  0.69 


I  2  3  4  5  6  7  8  9  10  11  12 

0.00  0.00  0.00  0.00  0.00  0.62  1.24  0.00  0.80  0.00  0.64  1.25 


1.  Level  1  (%) 

2.  Level  2  (%) 

3.  Conformation  number 

4.  Relative  energy  before  minimization  (kcal/mol) 


Figure  9.  Structure-trees  of  (a)  tristearin,  (b)  trielaidin,  and  (c)  triolein. 
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Figure  10.  Structures  of  the  most  significant  conformers  obtained  for  triolein  (OOO), 
tristearin  (SSS),  and  trielaidin  (EEE). 


perimentally.  Those  extended  models  were  then  classified  (Table  II)  in  order  to 
select  conformers  that  can  mimic  the  various  polymorphic  forms  (Fig.  10). 

For  tristearin,  the  structures  retained  were 

•  no.  1 ,  a  tuning-fork  structure,  to  represent  the  a  form,  and 

•  no.  3,  a  chair  structure,  to  represent  the  (3 '  or  (3  forms. 

For  trielaidin,  the  chosen  structures  were 

•  no.  4,  a  chair  structure,  which  can  mimic  a  0  form,  and 

•  no.  1 4,  a  tuning-fork  structure,  which  can  mimic  an  a  form. 


Table  II.  Classification  of  the  conformers  obtained  for  tristearin  and  trielaidin. 


Extended  structure 


Folded  structure  Chair  Tuning-fork 


Tristearin 

Trielaidin 


6,  8,  9,  II,  12 
3,  8,  11,  13 


3,  4, 

1,2,  4,  5,6 


1,2,  5,  7,  10 
7,  9,  10,  12,  14 
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The  Cartesian  coordinates  of  the  heavy  atoms  for  the  selected  conformers  of  tristearin 
and  trielaidin  are  presented  in  Tables  III  and  IV,  respectively. 

Bimolecular  and  Monolayer  Assemblies 

The  head-to-tail  bimolecular  assemblies  are  presented  in  Figure  11.  For  each 
type  of  assembly,  the  intra-  and  intermolecular  distances  between  each  pair  of 
atoms  of  the  three  parallel  chains  have  been  determined  and  compared  to  the  values 
of  short  spacings  (Fig.  2)  measured  experimentally  by  powder  X-ray  diffraction  for 
the  different  polymorphic  forms.  The  mean  distances  calculated  for  conformers 
no.  3  for  tristearin  and  no.  4  for  trielaidin  are,  within  ±0.2  A,  close  to  the  values 
of  short  spacings  characteristics  of  a  j3  form,  i.e.,  4.6,  3.9,  and  3.6  A.  The  distances 
calculated  for  conformer  no.  1  of  tristearin  vary  around  4.2  A,  which  is  the  value 
of  short  spacings  observed  for  an  a  form. 

Here,  also,  correlations  have  been  made  with  experimental  data,  coming  from 
differential  scanning  calorimetry  (DSC).  By  this  technique,  and  more  precisely  by 
the  measurement  of  the  enthalpy  of  fusion  (AH)  of  the  a ,  (3\  and  /3  forms,  one  can 
evaluate  the  stability  of  the  different  polymorphic  forms.  To  obtain  a  corresponding 
theoretical  relative  stability  value  for  each  dimer,  we  calculated  the  mean  interaction 
energy  between  two  molecules  (Eim),  as  the  sum  of  the  van  der  Waals,  electrostatic, 
and  transfer  energies. 

From  Table  V,  one  can  see  that  the  interaction  energy  values,  calculated  for  the 
dimers,  are  well  correlated  with  those  of  AH,  i.e.: 

•  For  saturated  compounds,  as  the  hydrocarbon  chain  length  increases,  the  sta¬ 
bility  of  the  lattice  increases  as  well;  the  values  of  Eint  are  much  lower  for 
tristearin  than  for  trilaurin. 

•  The  f3  form  is  more  stable  than  is  the  a  form. 

•  The  presence  of  unsaturations,  in  the  case  of  trielaidin,  reduces  the  stability  of 
both  the  a  and  (3  forms. 

Theoretical  structure  results  obtained  with  our  molecular  modeling  approach  thus 
corroborate  nicely  with  the  available  experimental  data. 

Finally,  the  most  probable  dimers  obtained  for  trilaurin,  tristearin,  and  trielaidin 
have  been  assembled  to  form  a  monolayer  in  order  to  simulate  the  three-dimensional 
packing  of  a  and  0  forms  in  a  double  hydrocarbon  chain  length.  For  illustration, 
a  monolayer  of  trielaidin,  under  the  a  and  (3  forms,  has  been  presented  in  Fig¬ 
ure  12. 


Conclusions 

The  resolution  of  the  three-dimensional  structure  of  triglycerides  by  monocrystal 
X-ray  diffraction  is  rather  complicated,  essentially  due  to  the  difficulty  in  obtaining 
monocrystals.  To  our  knowledge,  only  three  crystalline  structures  have  been  solved 
to  date  [9-12].  Our  laboratory  experiments,  despite  the  development  of  a  particular 
crystallization  system,  and  after  a  large  number  of  crystallizations  of  several  different 
compounds,  finally  enabled  us  to  obtain  one  monocrystal  of  /3-trielaidin  (twins 


Table  III.  Cartesian  coordinates  of  the  heavy  atoms  for  the  selected  conformers  of  tristearin. 
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n.  14  (alpha)  n.  4  (beta) 


Figure  1 1 .  Bi  molecular  assembly  of  the  select  conformers  of  tristearin  (SSS)  and  trielaidin 

(EEE). 

were  generally  obtained)  for  which  the  structure  was  solved  at  low  temperature 
[7].  Moreover,  this  method  can  be  applied  only  to  stable  polymorphic  forms. 

For  these  major  reasons,  to  compensate  for  the  lack  of  information  coming  from 
monocrystals  X-ray  diffraction,  we  developed  a  molecular  modeling  technique, 


Table  V.  Value  of  interaction  energy  (modelization  results) 
and  of  AH  (differential  scanning  calorimetry  measurements) 
determined  for  conformers  of  trilaurin  (LLL),  tristearin  (SSS), 
and  trielaidin  (EEE). 


Experimental  AH 
(kcal/mol) 

Conformer 

Hint 

(kcal/mol) 

LLL  no.  1 

-35.3 

SSS 

25.5 

(a)  no.  1 

-157.5 

44.0 

(0)  no.  3 

-208.5 

EEE 

9.7 

(a)  no.  14 

-67.9 

32.9 

(0)  no.  4 

-110.9 
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allowing  us  to  simulate  the  three-dimensional  structures  of  the  different,  stable  or 
less  stable,  polymorphic  forms  of  triglycerides.  First,  isolated  molecule  conformers 
can  be  generated  by  a  systematic  structure-tree  analysis  followed  by  a  minimization 
procedure.  The  total  conformational  energy  includes  van  der  Waals,  Coulomb,  and 
torsional  contributions.  Then,  selected  monomers  can  be  assembled  into  head-to- 
tail  dimers  and  later  in  monolayers.  Therefore,  the  interaction  energy  is  computed 
as  a  sum  of  van  der  Waals,  Coulomb,  and  transfer  energy  terms.  Because  of  the 
importance  of  the  electrostatic  contribution  in  the  evaluation  of  both  the  confor¬ 
mational  energy  of  the  isolated  molecules  as  well  as  for  the  interaction  energy  of 
the  assembly  of  the  dimers,  the  atomic  charge  calculations  were  handled  at  the  ab 
initio  MO-SCF  level  with  the  6-3 1G  basis  set. 

From  a  detailed  study  of  trilaurin,  a  CI2  triglyceride,  one  of  the  only  triglycerides 
whose  structure  has  been  solved,  a  work  strategy  has  been  defined  and  applied  to 
the  conformational  analysis  of  Q8  monoacid  triglycerides,  i.e.,  tristearin,  trielaidin, 
and  triolein. 

For  the  generation  of  the  isolated  conformers,  with  the  proposed  structure-tree 
analysis  (variations  around  11  torsional  angles  along  the  three  chains)  and  the 
proposed  potential  functions  (van  der  Waals,  torsional,  and  electrostatic  terms), 
one  can  use  atomic  charges  obtained  by  the  Mulliken  population  analysis.  For  the 
assembly  of  dimers  and  monolayers,  one  needs  to  use  the  potential-derived  charges 
previously  calculated  for  each  conformer. 

The  structure-tree  analysis  generated  many  low-energy  conformers:  12  for 
tristearin,  14  for  trielaidin,  and  12  for  triolein.  For  triolein,  interestingly,  one  obtains 
preferentially  folded  structures,  as  found  in  biological  systems.  Tristearin  and 
trielaidin  presented  mostly  extended  structures.  For  these,  and  by  correlation  with 
experimental  data  (nmr,  powder  X-ray  diffraction),  conformers  representing  a 
and  forms  were  retained. 

The  selected  conformers  were  then  sequentially  assembled  in  head-to-tail  bi- 
molecular  configurations.  For  each  assembly,  interaction  energy  values  were  cal¬ 
culated  and  compared  to  enthalpy  values,  measured  experimentally.  Results  ob¬ 
tained  by  modelization,  concerning  the  stability  of  the  different  assemblies,  correlate 
well  with  experimental  observations:  The  /?  form  is  more  stable  than  is  the  a  form, 
and  the  presence  of  trans  unsaturations  lowers  the  compacity  of  the  lattice. 

For  monoacid  triglycerides,  molecular  modeling  can  thus  reproduce  the  structures 
and  packing  of  the  polymorphic  forms  as  well  as  the  effects  of  chain  length  and  the 
presence  of  unsaturations  on  the  stability  of  crystalline  edifices.  Moreover,  it  allows 
the  simulation  of  less  stable  forms. 

Presently,  we  are  applying  the  same  strategy  to  diacid  triglycerides,  mixed  saturated 
and  unsaturated,  in  order  to  underline,  at  a  molecular  level,  the  influence  of  cis 
and  trans  unsaturations  on  the  polymorphism.  This  work  is  without  doubt  of  tech¬ 
nological  interest,  as  such  compounds  are  mostly  present  in  natural  fats. 
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Abstract 

Ab  initio  RHF  results  (geometry  data  and  potential  barriers)  are  reported  for  the  14  symmetry  unique 
local  minima  in  the  potential  energy  surface  of  e-aminohexanoic  acid,  which  contain  an  intramolecular 
N  •  •  •  H  —  O  hydrogen  bond.  Comparison  of  characteristic  data  with  those  of  the  homologs  with  fewer 
carbon  atoms  shows  that  5-aminopentanoic  acid  forms  the  most  stable  hydrogen  bonds.  In  contrast  to 
these  homologs,  the  — COOH  group  is  not  limited  to  anti-periplanar  orientation  in  the  H-bonded 
conformers  of  c-aminohexanoic  acid:  In  four  such  conformers,  it  occurs  in  syn-clinal  orientation.  ©  1994 
John  Wiley  &  Sons,  Inc. 


Introduction 

The  intramolecular  N  •  •  •  H  —  O  hydrogen  bonding  in  the  neutral  form  of  co- 
amino  acids  has  been  the  target  of  ab  initio  studies  for  a  number  of  years.  For 
glycine,  one  mirror  symmetrical  conformer  with  this  H  bond  is  found  with  split 
valence  basis  sets  [1-3];  with  basis  sets  that  include  polarization  functions,  the 
nature  of  this  conformation  changes  to  a  transition  state  between  two  mutually 
mirror  symmetrical  local  minima  with  slightly  lower  energy  [4,5].  Polarized  basis 
sets,  however,  apparently  tend  to  produce  unrealistic  structures  in  such  systems 
with  intramolecular  hydrogen  bonds  [6].  In  0-alanine,  one  symmetry  unique  con- 
former  with  a  N  •  •  •  H  —  O  hydrogen  bond  is  formed,  which  is  connected  with  its 
mirror  image  by  a  reaction  path  that  preserves  the  H  bond  [  7  ] .  In  7-aminobutyric 
acid  (GABA),  two  mirror  symmetrical  conformers  with  this  H  bond  are  formed, 
each  of  which  is  connected  with  both,  the  image  and  the  mirror  image  of  its  coun¬ 
terpart,  in  an  H  bond  preserving  reaction  [  8  ] .  6-Aminopentanoic  acid  follows  this 
pattern:  There  are  four  symmetry  unique  H-bonded  conformers  that  are  intercon¬ 
nected  in  such  a  way  that  for  each  conformer  three  different  H  bond  preserving 
reaction  paths  exist  [9].  In  all  of  these  systems,  the  hydrogen  bond  can  only  be 
formed  with  the  — COOH  group  in  anti-periplanar  orientation  (i.e.,  the  tosion 
angle  H  —  O  —  C  =  0  is  close  to  180°);  this  orientation  is  approximately  35  kJ/ 
mol  less  stable  than  is  the  syn-periplanar  orientation  with  aH — O — C=0  torsion 
angle  close  to  0°.  The  stabilization  due  to  the  hydrogen  bond  is  slightly  lower  than 
this  energy  difference,  so  none  of  the  H-bonded  conformers  is  the  global  minimum 
in  the  respective  potential  energy  surface: 
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The  strength  of  the  N  •  •  •  H  —  O  interaction  increases  monotonically  from  glycine 
to  5-aminopentanoic  acid  [3,9]  with  changes  of  several  hundred  cm-1  in  the  cal¬ 
culated  O  —  H  vibration  frequency.  This  dependence  of  the  H-bond  strength  on 
the  ring  size  and  the  interesting  topological  pattern  of  the  H  bond  preserving  reaction 
paths  lead  to  an  extension  of  this  series  of  ab  initio  calculations  to  the  next  homolog, 
e-aminohexanoic  acid.  The  well-known  rhf  formalism  [10]  and  the  4-3 1G  [11] 
basis  set  were  used  in  order  to  allow  a  comparison  of  the  results  with  those  of  earlier 
work  at  one  consistent  level.  All  geometries  were  fully  optimized  with  the  program 
GAMESS  [12]  to  remaining  maximum  and  root  mean  square  (rms)  gradients  less 
than  1  X  10-4  and  0.33  X  10-4  H  Bohr-1 ,  respectively.  Local  minima  were  verified 
to  have  only  positive  eigenvalues  of  the  Hessian  matrix,  and  saddle  points  were 
verified  to  have  exactly  one  negative  eigenvalue  of  the  Hessian  matrix.  The  Hessian 
matrix  was  obtained  via  numerical  differentiation  of  analytical  first  derivatives;  no 
scaling  was  performed  in  the  subsequent  vibration  frequency  analysis. 


Results 

The  potential  energy  surface  of  e-aminohexanoic  acid  contains  a  total  of  14  sym¬ 
metry  unique  local  minima  that  are  stabilized  by  an  intramolecular  N  •  •  *  H  —  O 
hydrogen  bond.  In  accordance  with  earlier  work,  the  torsion  angle  N  —  C  —  C  —  C 
was  chosen  as  the  criterion  for  symmetry  uniqueness.  Conformers  with  a  positive 
value  of  this  angle  are  labeled  I,  II, .  .  . ,  XIV  in  the  following;  conformers  with  a 
negative  value  are  labeled  Im,  II m, .  .  . ,  XIV  m.  The  geometry  data  of  these  local 
minima  are  collected  in  Table  I;  characteristic  data  related  to  the  H  bond  are 
summarized  in  Table  II. 

As  a  consequence  of  the  intramolecular  hydrogen  bond,  various  internal  rotations 
are  coupled  in  most  reaction  paths  of  the  H-bonded  conformers.  Table  III  lists  the 
potential  barriers  for  all  of  these  reaction  paths;  reactions  that  preserve  the  H  bond 
are  also  indicated  in  this  table.  In  contrast  to  the  homologs  up  to  S-aminopentanoic 
acid,  the  H  bond  preserving  reactions  of  e-aminohexanoic  acid  yield  the  very  com¬ 
plex  pattern  that  is  displayed  in  Figure  1 . 

The  two  conformers  VIII  and  XIII  are  noteworthy  because  of  the  small  value  of 
the  lowest  potential  barrier.  In  VIII,  this  value  is  0.90  kJ /mol  and  corresponds  to 
a  combined  rotation  of  the  groups  —  OH,  —  COOH,  and  —  NH2  that  destroys 
the  N  *  •  •  H — O  interaction.  The  harmonic  vibration  frequency  of  this  mode  in 
VIII  is  86.9  cm-1,  which  is  equivalent  to  a  vibrational  zero-point  energy  of  0.52 
kJ /mol.  Contrastingly,  in  XIII,  the  lowest  potential  barrier  of  0.06  kJ /mol  is  caused 
by  a  H  bond  preserving  reaction;  the  harmonic  frequency  of  XIII  associated  with 
this  mode  is  40.2  cm-1,  which  gives  a  zero-point  energy  of  0.24  kJ/mol.  Even  if 
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Table  I.  Geometry  data  and  energies  of  all  symmetry  unique  conformers  in  the  potential  energy 
surface  of  e-aminohexanoic  acid  with  a  N  •  •  •  H — O  hydrogen  bond. 


Conformer 


I 

II 

III 

IV 

V 

SCF  energy  (au) 

-438.3175141 

-438.3173938 

-438.3160495 

-438.3158663 

-438.3156863 

Bond  lengths  (A) 

C6 — N 

1.4737 

1.4716 

1.4693 

1.4716 

1.4765 

C5 — C6 

1.5277 

1.5291 

1.5327 

1.5292 

1.5342 

C4 — C5 

1.5455 

1.5364 

1.5334 

1.5369 

1.5381 

C3 — C4 

1.5353 

1.5393 

1.5361 

1.5406 

1.5319 

C2 — C3 

1.5432 

1.5459 

1.5266 

1.5263 

1.5439 

Cl—  C2 

1.5100 

1.5096 

1.5194 

1.5179 

1.5071 

HI— N 

1.0008 

1.0004 

1.0005 

1.0001 

1.0014 

H2 — N 

1.0003 

1.0011 

0.9998 

1.0004 

1.0013 

H4 — C6 

1.0861 

1.0867 

1.0864 

1.0867 

1.0860 

H5 — C6 

1.0815 

1.0796 

1.0804 

1.0818 

1.0817 

H6 — C5 

1.0859 

1.0871 

1.0859 

1.0869 

1.0850 

H7 — C5 

1.0842 

1.0858 

1.0855 

1.0846 

1.0847 

H8 — C4 

1.0840 

1.0816 

1.0824 

1.0842 

1.0860 

H9 — C4 

1.0823 

1.0851 

1.0850 

1.0847 

1.0859 

H10 — C3 

1.0838 

1.0854 

1.0820 

1.0843 

1.0812 

H 1 1  — C3 

1.0849 

1.0838 

1.0842 

1.0811 

1.0834 

H 1 2 — C2 

1.0795 

1.0795 

1.0840 

1.0822 

1.0783 

H13 — C2 

1.0822 

1.0798 

1.0831 

1.0851 

1.0836 

Ol—  Cl 

1.2065 

1.2063 

1.2068 

1.2068 

1.2044 

02— Cl 

1.3372 

1.3376 

1.3359 

1.3356 

1.3418 

H3— 02 

0.9748 

0.9753 

0.9719 

0.9727 

0.9718 

Valence  angles  (degree) 

C5 — C6 — N 

111.07 

111.92 

111.54 

112.30 

112.12 

C4 — C5 — C6 

113.73 

117.41 

114.15 

113.39 

115.20 

C3 — C4 — C5 

114.44 

116.66 

114.51 

113.83 

115.71 

C2 — C3 — C4 

114.89 

117.21 

112.98 

113.54 

113.85 

Cl— C2— C3 

112.13 

112.46 

116.89 

116.53 

112.24 

HI—  N— C6 

112.85 

113.36 

113.82 

113.43 

112.45 

H2 — N — C6 

113.52 

113.19 

113.68 

113.25 

112.78 

H4 — C6 — C5 

109.87 

108.30 

109.01 

109.45 

109.35 

H5 — C6 — C5 

109.78 

110.59 

110.17 

109.03 

110.20 

H6 — C5 — C4 

110.56 

107.86 

108.40 

109.96 

110.60 

H7 — C5 — C4 

109.63 

109.14 

110.32 

109.90 

107.52 

H8 — C4 — C3 

108.03 

110.13 

109.66 

108.38 

108.22 

H9 — C4 — C3 

109.26 

107.22 

108.94 

110.13 

109.51 

H10 — C3 — C4 

109.41 

109.44 

110.80 

108.91 

109.00 

HI  1 — C3 — C4 

109.31 

108.23 

109.26 

110.41 

109.01 

H 1 2 — C2 — C3 

108.97 

111.34 

110.24 

111.03 

110.87 

H13 — C2 — C3 

110.57 

108.34 

110.87 

110.03 

110.23 

Ol— Cl— C2 

122.33 

122.64 

120.21 

120.61 

124.10 

02— Cl—  C2 

116.65 

116.14 

119.62 

119.04 

1 14.62 

H3— 02— Cl 

118.40 

117.38 

123.16 

122.63 

116.64 
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Table  I.  (Continued) 


Conformer 


I 

II 

III 

IV 

V 

Torsion  angles  (degree) 

C4 — C5 — C6 — N 

71.29 

53.20 

43.16 

58.33 

90.77 

C3 — C4 — C5 — C6  - 

-128.26 

63.28 

53.70 

-90.15 

-54.79 

C2 — C3 — C4 — C5 

69.22 

-82.02 

-160.38 

154.24 

-66.47 

Cl— C2— C3— C4 

67.55 

-59.31 

70.15 

-66.31 

149.63 

HI— N— C6— C5 

72.43 

178.94 

60.21 

66.40 

64.76 

H2 — N — C6 — C5 

-161.76 

52.94 

-172.82 

-167.27 

-170.83 

H4 — C6 — C5 — C4  - 

-165.41 

176.30 

167.35 

-177.23 

-146.36 

H5 — C6 — C5 — C4 

-47.72 

-67.09 

-75.81 

-60.22 

-29.12 

H6 — C5 — C4 — C3 

-5.91 

-174.21 

175.70 

33.28 

69.19 

H7 — C5 — C4 — C3 

110.33 

-59.45 

-68.29 

149.77 

-175.81 

H8 — C4 — C3 — C2  - 

-169.22 

44.40 

-37.87 

-83.42 

171.95 

H9 — C4 — C3 — C2 

-54.52 

158.81 

77.48 

32.00 

56.56 

H 1 0 — C3 — C4 — C5  - 

-169.05 

41.44 

-37.95 

-84.71 

56.82 

HI  1 — C3 — C4 — C5 

-52.55 

157.06 

78.42 

31.03 

172.20 

H 1 2 — C2 — C3 — C4  - 

-174.28 

64.16 

-169.13 

55.77 

-89.18 

H 1 3 — C2 — C3 — C4 

-55.73 

-177.36 

-52.15 

172.89 

29.99 

Ol— Cl— C2— C3 

92.59 

-92.18 

-176.19 

167.80 

118.47 

02— Cl— C2— C3 

-85.93 

86.64 

5.47 

-14.29 

-61.86 

H3— 02— Cl— C2 

1.58 

-3.29 

-9.86 

9.12 

-0.42 

Distances  (A)  less  than  95.0%  of  sum  of  van  der  Waals  radii 

H3 — N 

1.8213 

1.8207 

1.8726 

1.8517 

1.8665 

H5— H12 

2.1170 

H5 — H10 

2.1837 

H3 — H8 

2.1997 

H3 — H10 

2.1895 

2.2080 

H3 — HI  1 

2.1224 

H13 — H6 

2.2329 

Additional  attractive  interactions  (A) 

02— H2 

2.9640 

H8 — N 

2.7782 

2.6570 

H9 — N 

2.8362 

2.9937 

H10 — N 

2.7793 

2.7724 

HI  1 — N 

2.5664 

H12 — N 

2.9250 

02— H8 

2.5386 

2.6397 

02— H9 

2.5790 

2.7043 

02— HIO 

2.7021 

2.5618 

02— H 1 1 

2.7024 

Vibrational  zero-point  energies  (kj/mol) 

553.710 

554.178 

552.516 

552.316 

553.624 

Rotation  constants  (GHz) 

0.8339 

0.8777 

0.7818 

0.8018 

0.7792 

1.0797 

1.0968 

0.9628 

0.9868 

1.0086 

2.4041 

2.4186 

2.7581 

2.6078 

2.5160 
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Table  I.  (Continued) 


Conformer 


VI 

VII 

VIII 

IX 

X 

SCF  energy  (au) 

-438.3148969 

-438.3146779 

-438.3134942 

-438.3131759 

-438.3124873 

Bond  lengths  (A) 

C6 — N 

1.4754 

1.4778 

1.4768 

1.4757 

1.4773 

C5 — C6 

1.5404 

1.5370 

1.5330 

1.5331 

1.5287 

C4 — C5 

1.5416 

1.5508 

1.5400 

1.5372 

1.5470 

C3—C4 

1.5410 

1.5459 

1.5514 

1.5368 

1.5370 

C2 — C3 

1.5396 

1.5545 

1.5523 

1.5462 

1.5485 

Cl— C2 

1.4978 

1.4938 

1.4924 

1.5052 

1.5124 

HI— N 

0.9991 

1.0016 

1.0003 

1.0010 

0.9993 

H2 — N 

0.9992 

0.9995 

1.0001 

1.0012 

0.9988 

H4 — C6 

1.0865 

1.0863 

1.0874 

1.0860 

1.0811 

H5 — C6 

1.0785 

1.0817 

1.0790 

1.0808 

1.0861 

H6 — C5 

1.0851 

1.0824 

1.0875 

1.0867 

1.0803 

H7 — C5 

1.0801 

1.0845 

1.0862 

1.0848 

1.0843 

H8 — C4 

1.0867 

1.0842 

1.0791 

1.0843 

1.0841 

H9 — C4 

1.0858 

1.0806 

1.0845 

1.0858 

1.0863 

H10 — C3 

1.0848 

1.0831 

1.0842 

1.0825 

1.0839 

HI  1 — C3 

1.0804 

1.0839 

1.0830 

1.0825 

1.0842 

H 1 2 — C2 

1.0849 

1.0793 

1.0799 

1.0812 

1.0824 

H 1 3 — C2 

1.0804 

1.0798 

1.0795 

1.0781 

1.0800 

01— Cl 

1.2096 

1.2129 

1.2110 

1.2038 

1.2072 

02— Cl 

1.3433 

1.3404 

1.3451 

1.3416 

1.3354 

H3— 02 

0.9734 

0.9801 

0.9758 

0.9709 

0.9770 

Valence  angles  (degree) 

C5 — C6 — N 

110.93 

111.94 

113.27 

112.88 

112.08 

C4 — C5 — C6 

115.28 

115.35 

118.16 

117.58 

114.27 

C3 — C4 — C5 

117.43 

113.70 

115.16 

115.75 

115.65 

C2 — C3 — C4 

116.13 

114.71 

115.07 

113.84 

115.85 

Cl— C2— C3 

112.73 

109.66 

108.49 

111.44 

114.27 

HI— N— C6 

113.83 

112.73 

112.47 

1 12.42 

113.47 

H2 — N — C6 

1 14.04 

113.24 

113.36 

112.43 

113.85 

H4 — C6 — C5 

109.53 

109.30 

107.43 

108.45 

109.04 

H5 — C6 — C5 

110.64 

110.33 

111.02 

110.28 

110.02 

H6 — C5 — C4 

107.61 

1 10.47 

108.26 

109.10 

110.63 

H7 — C5 — C4 

109.76 

108.24 

108.58 

107.46 

109.36 

H8 — C4 — C3 

107.66 

108.67 

110.10 

108.62 

107.66 

H9 — C4— C3 

108.61 

109.24 

108.47 

109.76 

108.31 

H10 — C3 — C4 

107.40 

109.72 

110.39 

109.67 

107.78 

H 1 1  — C3 — C4 

110.47 

109.44 

109.51 

109.01 

110.20 

H 1 2 — C2 — C3 

108.47 

109.67 

109.38 

110.17 

110.18 

H 1 3 — C2 — C3 

111.64 

109.60 

110.14 

110.48 

108.24 

Ol— Cl— C2 

124.76 

124.76 

124.29 

124:44 

121.53 

02— Cl—  C2 

112.38 

112.19 

112.68 

114.01 

117.50 

H3— 02— Cl 

115.17 

113.98 

113.15 

115.18 

118.44 
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Table  I.  (Continued) 


Conformer 


VI 

VII 

VIII 

IX 

X 

Torsion  angles  (degree) 

C4 — C5 — C6 — N 

138.89 

90.10 

66.32 

83.79 

60.14 

C3 — C4 — C5 — C6 

-62.61 

-144.59 

64.30 

-65.82 

-124.41 

C2 — C3 — C4 — C5 

-61.97 

90.86 

-117.35 

92.47 

71.90 

Cl— C2— C3—C4 

64.96 

-43.79 

59.79 

-150.17 

-87.59 

HI—  N— C6— C5 

175.79 

62.47 

174.08 

57.09 

149.83 

H2 — N — C6 — C5 

47.43 

-171.25 

47.95 

-178.86 

-83.36 

H4 — C6 — C5 — C4 

-97.49 

-147.28 

-171.34 

-153.77 

178.92 

H5— C6 — C5 — C4 

20.47 

-29.81 

-54.75 

-37.12 

-63.75 

H6 — C5 — C4 — C3 

175.56 

-21.44 

-172.23 

58.66 

-1.76 

H7 — C5 — C4 — C3 

61.26 

93.84 

-57.68 

173.10 

1 14.76 

H8 — C4 — C3 — C2 

174.99 

-148.56 

7.36 

-147.40 

-166.13 

H9 — C4 — C3 — C2 

60.50 

-32.76 

123.21 

-31.93 

-52.62 

H 1 0 — C3 — C4 — C5 

177.73 

-148.48 

5.23 

-146.62 

-168.20 

HI  1 — C3 — C4 — C5 

61.93 

-31.79 

121.83 

-30.79 

-52.59 

H 1 2 — C2 — C3 — C4  - 

-175.49 

75.59 

179.15 

-30.23 

37.57 

H 1 3 — C2 — C3 — C4 

-56.78 

-164.32 

-60.43 

89.01 

155.16 

01— Cl— C2— C3  - 

-129.69 

-85.51 

86.42 

-109.61 

-85.95 

02— Cl—  C2— C3 

48.53 

88.96 

-87.48 

69.38 

93.73 

H3— 02— Cl— C2  - 

-140.60 

-138.63 

131.03 

2.30 

-9.49 

Distances  (A)  less  than  95.0%  of  sum  of  van  der  Waals  radii 

H3 — N 

1.9468 

1.8283 

1.8900 

1.8679 

1.7942 

H5 — HI  1 

2.2309 

02— H9 

2.4263 

02— H8 

2.3567 

Additional  attractive  interactions  (A) 

H8 — N 

2.9390 

H9 — N 

2.7184 

HI  I — N 

2.9718 

H12 — N 

2.8558 

01—  H5 

2.6010 

02— H5 

2.6737 

Ol— H6 

2.5998 

02— H6 

2.5297 

Ol— H7 

2.5686 

02— H7 

2.8328 

Ol— Hll 

2.9874 

02— Hll 

2.5210 

2.6356 

Vibrational  zero-point  energies  (kJ/mol) 

551.663 

552.594 

551.547 

553.607 

553.201 

Rotation  constants  (GHz) 

1.0246 

1.0057 

1.0392 

0.7805 

0.9052 

1.4497 

1.3803 

1.4313 

1.0053 

1.1678 

1.8630 

2.0304 

1.9354 

2.5159 

2.1733 
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Table  I.  (Continued) 


Conformer 


XI 

XII 

XIII 

XIV 

SCF  energy  (au) 

-438.3124520 

-438.3123593 

-438.3050339 

-438.3032209 

Bond  lengths  (A) 

C6— N 

1.4781 

1.4760 

1.4793 

1.4701 

C5 — C6 

1.5466 

1.5342 

1.5407 

1.5397 

C4 — C5 

1.5540 

1.5388 

1.5481 

1.5473 

C3 — C4 

1.5467 

1.5396 

1.5430 

1.5397 

C2 — C3 

1.5409 

1.5335 

1.5438 

1.5438 

Cl — C2 

1.4932 

1.5113 

1.5081 

1.5132 

HI — N 

0.9998 

1.0009 

1.0014 

0.9998 

H2 — N 

0.9997 

0.9983 

0.9995 

0.9981 

H4— C6 

1.0862 

1.0849 

1.0852 

1.0856 

H5— C6 

1.0772 

1.0825 

1.0795 

1.0806 

H6 — C5 

1.0813 

1.0827 

1,0804 

1.0851 

H7 — C5 

1.0839 

1.0852 

1.0837 

1.0842 

H8 — C4 

1.0841 

1,0857 

1.0839 

1.0856 

H9 — C4 

1.0862 

1.0864 

1.0852 

1.0852 

H10 — C3 

1.0845 

1.0820 

1.0844 

1.0832 

H 1 1  — C3 

1.0818 

1.0846 

1.0828 

1.0786 

Hl2 — C2 

1.0801 

1.0871 

1.0781 

1.0837 

H 1 3 — C2 

1.0842 

1.0803 

1.0816 

1.0782 

Ol— Cl 

1.2123 

1.2065 

1.2055 

1.2051 

02— Cl 

1.3392 

1.3369 

1.3406 

1,3406 

H3— 02 

0.9792 

0.9726 

0.9716 

0.9721 

Valence  angles  (degree) 

C5— C6— N 

111.95 

112.54 

110.47 

113.71 

C4 — C5 — C6 

116.18 

114.21 

117.62 

119.63 

C3— C4 — C5 

116.12 

117.67 

119.78 

119.66 

C2 — C3 — C4 

116.59 

115.75 

118.48 

116.55 

Cl— C2— C3 

112.32 

118.84 

113.85 

113.87 

HI— N— C6 

113.63 

113.17 

112.68 

113.64 

H2 — N — C6 

113.02 

114.09 

113.80 

114.73 

H4 — C6 — C5 

109.40 

109.56 

110.19 

107.85 

H5— C6— C5 

1 10.97 

109.60 

111.21 

110.22 

H6 — C5 — C4 

108.88 

111,46 

108.76 

106.34 

H7— C5 — C4 

108.02 

107.18 

107.03 

109.24 

H8 — C4 — C3 

106.76 

108.74 

108.34 

106.95 

H9— C4 — C3 

109.86 

107.87 

106.25 

109.03 

H10—C3— C4 

108.24 

110.56 

109,73 

107.55 

H 1 1  — C3— C4 

109.10 

107.54 

106.68 

108.53 

H12 — C2 — C3 

111.08 

108.48 

110.46 

110.92 

H 1 3 — C2 — C3 

108.77 

110.33 

108.08 

111.44 

Ol— Cl— C2 

124.29 

121,33 

123.03 

122.57 

02— Cl— C2 

112,96 

118.09 

115.61 

116.76 

H3— 02— Cl 

113.36 

121.10 

115.81 

119.57 
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Table  I.  (Continued) 

Conformer 


XI 

XII 

XIII 

XIV 

Torsion  angles  (degree) 

C4 — C5 — C6 — N 

128.71 

87.57 

113.72 

31.19 

C3 — C4 — C5 — C6 

-102.61 

-60.94 

-33.79 

39.59 

C2 — C3 — C4 — C5 

86.46 

-83.44 

-43.46 

41.16 

C 1  — C2 — C3 — C4 

-52.94 

67.85 

-42.14 

-131.07 

Hi — N — C6 — C5 

61.10 

86.18 

69.18 

165.13 

H2 — N — C6 — C5 

-171.71 

-146.78 

-164.81 

36.55 

H4 — C6 — C5 — C4 

-107.92 

-149.36 

-123.52 

155.29 

H5 — C6 — C5 — C4 

9.71 

-32.56 

-5.50 

-88.99 

H6 — C5 — C4 — C3 

20.54 

63.58 

89.14 

162.84 

H7 — C5 — C4 — C3 

135.67 

179.21 

-157.05 

-83.81 

H8 — C4 — C3 — C2 

-153.58 

40.67 

79.49 

165.35 

H9— C4 — C3 — C2 

-39.37 

155.20 

-167.12 

-81.20 

H 1 0 — C3 — C4 — C5 

-153.24 

43.32 

79.78 

163.25 

HI  1 — C3 — C4 — C5 

-37.41 

158.20 

-165.62 

-83.13 

H 1 2 — C2 — C3 — C4 

67.72 

-171.69 

81.58 

-10.30 

H 1 3 — C2 — C3 — C4 

-173.43 

-55.48 

-161.10 

108.19 

01— Cl— C2— C3 

124.60 

-151.07 

-72.40 

-136.38 

02— Cl—  C2— C3 

-52.41 

33.20 

107.32 

45.63 

H3— 02— Cl— C2 

143.61 

-21.16 

-9.26 

-3.24 

Distances  (A)  less  than  95.0%  of  sum  of  van  der  Waals  radii 

H3 — N 

1.8612 

1.8229 

1.8972 

1.8551 

H3— H10 

2.1595 

H3 — H6 

2.2709 

H3 — H12 

2.1457 

H3 — HI  1 

2.0412 

Additional  attractive  interactions  (A) 

02— HI 

2.7576 

H10 — N 

2.6095 

HI  1 — N 

2.6627 

HI  2— N 

2.6580 

01— H5 

2.5088 

02— H5 

2.7890 

02— H6 

2.6097 

2.6505 

2.5409 

02— H10 

2.7797 

02— Hll 

2.6741 

2.4741 

Vibrational  zero-point  energies  (kJ/mol) 

552.122 

551.661 

553.961 

553.571 

Rotation  constants  (GHz) 

1.0357 

0.9093 

0.9843 

0.8693 

1.5181 

1.1179 

1.2592 

1.0792 

1.8666 

2.2306 

2.0800 

2.3979 

HYDROGEN  BONDING  IN  c-AMINOHEXANOIC  ACID 


88 


RAMEK 


Table  III.  Potential  barriers  (kJ/mol)  of  all  reaction  paths  of  conformers 

i 


Conformer  Potential  barrier  Reaction  path  description 


I 


II 


III 


IV 


V 


VI 


VII 


22.93 

2; 

26.47 

10,  11,  14;  leads  to  IIm 

28.11 

4,  5,  8,  10;  leads  to  IV 

36.71 

1,4,  9; 

42.89 

6,  7,  10;  leads  to  V 

44.41 

3,  12; 

53.20 

9; 

30.04 

3,  6,  7;  leads  to  III 

32.51 

8,  9,  12;  leads  to  XIII 

32.61 

1,4; 

33.24 

10,  11,  14;  leads  to  Im 

36.38 

3,  6,  9,  12;  leads  to  XII 

37.19 

2,  3; 

43.58 

5,  8;  leads  to  XIV 

11.82 

1,  4; 

17.80 

2,  3;  leads  to  VIII 

21.39 

10,  11,  13;  leads  to  IVm 

26.52 

4,  5,  8;  leads  to  II 

26.56 

8,  9,  12;  leads  to  XII 

28.39 

3,  6,  8,  9,  12;  leads  to  V 

34.60 

3,  8,  10,  11;  leads  to  Xm 

42.17 

7,  4; 

15.53 

4,  5,  7,  12;  leads  to  IX 

19.82 

2,  7,  12;  leads  to  XI 

20.66 

4,  7,  9;  leads  to  X 

20.91 

10,  11,  13;  leads  to  III"1 

23.79 

3,  6,  7;  leads  to  I 

24.69 

1,4,7,  9,  12;  leads  to  VII 

45.24 

8; 

10.02 

1,4,  5;  leads  to  XII 

27.43 

7,  10,  11;  leads  to  III 

29.94 

2; 

34.50 

3; 

38.09 

5,  8,  9,  11;  leads  to  I 

40.15 

6,  12; 

44.54 

6; 

2.45 

1,  7,  12; 

13.97 

2,  7,  11;  leads  to  XII 

23.42 

1,  4,  5,  12; 

33.18 

1,  8,  10,  11; 

34.83 

8,  9,  11; 

5.92 

1,4,  8,  9,  11; 

6.70 

1,9,  11; 

9.15 

1,3,  6,  7,  9,  11; 

12.31 

1,  4,  5,  10,  12; 

17.88 

2,  5,  10,  11;  leads  to  X 

21.57 

2,  3,  8,  10,  11;  leads  to  IV 
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Table  III.  (Continued) 


Conformer 

Potential  barrier 

Reaction  path  description 

VIII 

0.90 

2,  3,  7; 

11.09 

1,  4;  leads  to  III 

17.70 

2,  5,  8,  4; 

24.24 

3,  8,  9,  12; 

35.23 

2,  7,  10,  11; 

IX 

9.00 

3,  6,  8,  11;  leads  to  IV 

14.83 

6,  9,  11;  leads  to  X 

23.73 

6,  7,  10,  12;  leads  to  XIII 

29.19 

6,  7,  10,  11;  leads  to  XIV 

30.66 

2,  3,  6,  9,  12;  leads  to  XI 

44.25 

4,  7,  9,  12; 

44.60 

5,  9; 

X 

11.79 

2,  3,  8,  10;  leads  to  IV 

12.13 

1,6,  9,  12;  leads  to  VII 

13.02 

5,  10;  leads  to  IX 

20.96 

4; 

24.97 

9; 

25.25 

10,  11;  leads  to  IIIm 

XI 

5.85 

2,  5,  7,  9,  12; 

10.62 

1,  8,  11;  leads  to  IV 

25.88 

2,  3,  6,  7,  10,  12; 

28.76 

1,  4,  5,  10,  11;  leads  to  IX 

XII 

1.29 

2,  3,  6;  leads  to  V 

7.30 

1,  8,  12;  leads  to  VI 

16.87 

7,  10,  11;  leads  to  III 

23.16 

4,  5,  10,  11;  leads  to  II 

23.16 

4,  5,  8,  10,  12;  leads  to  XIII 

26.53 

2,  3,  6,  11;  leads  to  XIVm 

XIII 

0.06 

7,  10,  11;  leads  to  II 

2.36 

5,  8,  9,  11;  leads  to  IX 

3.93 

3,  6,  7,  9,  11;  leads  to  XII 

XIV 

2.53 

2,  4,  6,  10,  11;  leads  to  XIIm 

3.06 

5,  8,  9,  12;  leads  to  IX 

6.37 

6,  7;  leads  to  II 

24.46 

1,5; 

3  In  most  of  these  reactions,  several  internal  rotations  are  coupled  as 
a  consequence  of  the  intramolecular  N  •  •  •  H — O  hydrogen  bond.  In¬ 
ternal  rotations  are  labeled  as  follows:  1 ,  2,  change  of  dihedral  H — O — 
Cl — C2;  3,  4,  change  of  dihedral  0  =  0 — C2 — C3;  5,  6,  change  of 
dihedral  Cl— C2— C3— C4;  7,  8,  change  of  dihedral  C2— C3— C4— 
C5;  9,  10,  change  of  dihedral  Cl—  C2— C3— C4;  11,  12,  change  of  di¬ 
hedral  C4 — C5 — C6 — N ;  13,  14,  change  of  dihedrals  C5 — C6 — N — H; 
odd/even  numbers  indicate  decrease/increase  of  the  respective  dihedral 
angle.  For  reactions  that  preserve  the  hydrogen  bond,  the  target  conformer 
is  also  listed. 
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VIII  VI 


Figure  1.  All  reaction  paths  between  conformers  I-XIV  and  their  mirror  images  Im- 
XIV  m  that  preserve  the  N  •  •  •  H  —  O  hydrogen  bond. 


an  extraordinary  degree  of  anharmonicity  is  assumed,  the  zero-point  energy  still  is 
larger  than  the  potential  barrier  in  this  case.  Hence,  XIII  cannot  be  classified  as  a 
stable  conformer,  although  the  potential  energy  surface  has  a  true  local  minimum 
at  this  conformation.  The  low  barrier  of  0.06  kJ /mol  also  influences  the  reaction 
paths  of  XIII  to  a  large  extent.  In  contrast  to  the  other  local  minima,  XIII  can 
access  only  three  distinct  reaction  paths  that  lead  to  saddle  points. 

Another  feature  that  deserves  special  attention  is  the  orientation  of  the  groups 
C  =  O  and  O — H.  This  orientation  is  characterized  byaH  —  O  —  C=0  torsion 
angle  around  180°  in  10  of  the  14  local  minima;  in  VI,  VII,  VIII,  and  XI,  however, 
the  absolute  value  of  the  H  —  O  —  C=0  torsion  angle  is  around  40°  (see  Dis¬ 
cussion  section). 

The  stability  of  the  H-bonded  local  minima  of  €-aminohexanoic  acid  is  judged 
quite  differently  by  the  usual  criteria  for  hydrogen  bonds.  Ordering  according  to 
SCF  energy  gives  the  stability  ranking 

I  >  II  >  III  >  IV  >  V  >  VI  >  VII  >  VIII  >  IX  >  X  >  XI  >  XII  XIII  >  XIV ; 

inclusion  of  vibrational  zero-point  energies  changes  this  ranking  slightly  to 

I  >  II  >  III  >  IV  >  V  >  VI  >  VII  >  VIII  >  IX  >  XII  >  XI  >  X  >  XIII  >  XIV. 

Different  rankings  are  obtained  if  bond  lengths  and  electron  densities  are  considered: 
The  O — H  bond  length  (with  values  for  VI,  VII,  VIII,  and  XI  reduced  by  the 
mean  difference  between  syn  and  anti  orientation  of  the  groups  C  =  O  and  O  —  H ) 
yields  the  ranking 
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X  >  II  «  VII  >  I  >  XI  >  IV  PS  XII  >  XIV  >  III 

«  V  «  XIII  >  IX  «  VIII  >  VI, 

the  electron  density  in  the  critical  point  that  characterizes  the  N — H  hydrogen 
bond  [13]  gives  the  ranking 

X  >  II  >  VII  >  I  >  XII  >  XI  >  IV  >  XIV  >  V  >  IX  >  III  >  VIII  >  VI  >  XIII, 
and  the  N  —  H  distance  gives  the  ranking 

X  >  II  >  I  >  XII  >  VII  >  IV  >  XIV  >  XI  >  V  >  IX  >  III  >  VIII  >  XIII  >  VI. 
The  N  —  H  bond  order  [14]  yields  another  different  ranking,  namely, 

VII  >  XI  >  I  «  X  >  II  »  V  >  IX  >  VIII 


«  XII  >  XIII  >  III  >  VI  >  IV  >  XIV. 

The  height  of  the  lowest  potential  barrier,  which  determines  kinetic  stability,  gives 
yet  another  ranking: 

II  >  I  >  IV  >  III  «  X  >  V  >  IX  >  VII 

»  XI  >  XIV  «  VI  >  XII  >  VIII  >  XIII. 

Discussion 

Figure  2  displays  a  comparison  of  characteristic  data  of  the  N  •  •  *  H  —  O  hydrogen 
bond  in  the  series  glycine,  0-alanine,  7-aminobutyric  acid,  5-aminopentanoic  acid, 
and  e-aminohexanoic  acid.  This  comparison  shows  that  the  strength  of  the  hydrogen 
bond  increases  up  to  5-aminopentanoic  acid  and  then  decreases  from  5-aminopen- 
tanoic  acid  to  e-aminohexanoic  acid.  At  the  same  time,  the  difference  between  the 
strongest  and  weakest  conformers  in  terms  of  the  displayed  quantities  is  approxi¬ 
mately  constant  for  7-aminobutyric  acid,  6-aminopentanoic  acid,  and  e-amino¬ 
hexanoic  acid.  From  these  facts,  it  is  obvious  that  the  eight-membered  ring,  which 
is  formed  in  5-aminopentanoic  acid,  is  the  one  with  minimal  constraints.  Both  7- 
aminobutyric  acid  and  e-aminohexanoic  acid  cannot  form  more  stable  hydrogen 
bonds,  but  due  to  opposite  reasons:  In  7-aminobutyric  acid,  the  carbon  chain  is 
too  short  to  allow  closer  contact  between  the  two  functional  groups,  and  in  e- 
aminohexanoic  acid,  the  chain  is  already  too  long.  As  a  consequence,  most  FI- 
bonded  conformers  of  e-aminohexanoic  acid  have  rather  distorted  geometries  with 
one  main  chain  torsion  angle  around  ±120°,  which  in  the  absence  of  intramolecular 
interactions  is  a  typical  value  for  transition  states.  Also,  most  H-bonded  conformers 
contain  repulsive  interactions  between  hydrogen  atoms  with  H  —  H  distances  as 
low  as  85%  of  the  sum  of  the  van  der  Waals  radii  (cf.  Table  I). 

The  length  of  the  carbon  chain  in  e-aminohexanoic  acid  is,  however,  just  long 
enough  to  allow  nine-membered  rings  with  the  — COOH  group  in  the  distorted 
syn  orientation  in  VI,  VII,  VIII,  and  XI.  This  is  a  remarkable  contrast  to  glycine, 
0-alanine,  7-aminobutyric  acid,  and  5-aminopentanoic  acid,  for  which,  as  already 
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Figure  2.  O  —  H  bond  length  and  harmonic  vibration  frequency  and  N — H  distance  of 
all  H2N — (CH2)„ — COOH  conformers  with  an  intramolecular  N  •  •  *H — O  hydrogen 
bond:  (•)  anti-periplanar  orientation  of  C=0  and  O  —  H;  (O)  syn-clinal  orientation  of 

C  =  0  and  O  —  H. 


mentioned,  the  — COOH  group  occurs  exclusively  in  anti-periplanar  orientation 
in  all  H-bonded  conformers.  The  syn  orientation  in  VI,  VII,  VIII,  and  XI  does 
not  lead  to  lower  energies,  because  it  is  too  distorted  from  theH — O — C=0  « 
0°  orientation  that  is  the  most  stable  one  in  the  absence  of  any  interactions  that 
involve  the  — COOH  group.  The  large  amount  of  distortion  is  also  evident  from 
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the  pattern  of  hydrogen  bond  preserving  reaction:  VI  and  VIII  both  exhibit  only 
one  H  bond  preserving  reaction  path,  VII  and  XI  have  two  such  reaction  paths, 
whereas  all  e-aminohexanoic  acid  conformers  with  H~  O — C=0  ^  180°  take 
part  in  three  or  more  such  reaction  paths. 

If  the  trends,  which  can  be  gathered  from  e-aminohexanoic  acid  and  its  homologs 
with  fewer  carbon  atoms,  are  extrapolated  further,  one  is  led  to  the  expectation 
that  similar  10-membered  rings  will  allow  hydrogen-bonded  conformers  in  which 
the  terminal  groups  are  present  in  their  energetically  most  favorable  orientation. 
In  such  conformers,  hence,  the  stabilization  due  to  the  hydrogen  bond  would  not 
be  canceled  by  steric  hinderings  or  unfavorable  orientations,  as  is  the  case  in  all  co¬ 
amino  acids  up  to  and  including  e-aminohexanoic  acid.  This  expectation  should 
also  be  generalizable  to  other  organic  compounds  with  intramolecular  hydrogen 
bonds,  because  the  specific  interaction  N  •  •  *  H  —  O  in  the  co-amino  acids  has  been 
found  not  to  be  the  dominating  structural  feature  [  1 5  ]  in  this  class  of  compounds. 
These  considerations,  therefore,  might  be  an  explanation  for  the  fact  that  10- 
membered  hydrogen-bonded  rings  occur  so  often  in  peptides  and  proteins. 
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Abstract 

The  ability  of  the  Pm3  semiempirical  quantum  mechanical  method  to  reproduce  hydrogen  bonding 
in  nucleotide  base  pairs  was  assessed.  Results  of  pm 3  calculations  on  the  nucleotides  2'-deoxyadenosine 
5 '-monophosphate  (pdA),  2'-deoxyguanosine  5 '-monophosphate  (pdG),  2'-deoxycytidine  5 '-mono¬ 
phosphate  (pdC),  and  2'-deoxythymidine  5 '-monophosphate  (pdT)  and  the  base  pairs  pdA-pdT,  pdG- 
pdC,  and  pdG(syw)-pdC  are  presented  and  discussed.  The  pm3  method  is  the  first  of  the  parameterized 
nddo  quantum  mechanical  models  with  any  ability  to  reproduce  hydrogen  bonding  between  nucleotide 
base  pairs.  Intermolecular  hydrogen  bond  lengths  between  nucleotides  displaying  Watson-Crick  base 
pairing  are  0. 1-0.2  A  less  than  experimental  results.  Nucleotide  bond  distances,  bond  angles,  and  torsion 
angles  about  the  glycosyl  bond  (X),  the  C^>  —  C5'  bond  (7),  and  the  C5'  O5'  bond  (0)  agree  with 

experimental  results.  There  are  many  possible  conformations  of  nucleotides,  pm  3  calculations  reveal  that 
many  of  the  most  stable  conformations  are  stabilized  by  intramolecular  C — H — -  O  hydrogen 

bonds.  These  interactions  disrupt  the  usual  sugar  puckering.  The  stacking  interactions  of  a  dT-pdA 
duplex  are  examined  at  different  levels  of  gradient  optimization.  The  intramolecular  hydrogen  bonds 
found  in  the  nucleotide  base  pairs  disappear  in  the  duplex,  as  a  result  of  the  additional  constraints  on 
the  phosphate  group  when  part  of  a  DNA  backbone.  Sugar  puckering  is  reproduced  by  the  pm3  method 
for  the  four  bases  in  the  dT-pdA  duplex.  pm3  underestimates  the  attractive  stacking  interactions  of  base 
pairs  in  a  B-DNA  helical  conformation.  The  performance  of  the  pm3  method  implemented  in  SPARTAN 
is  contrasted  with  that  implemented  in  MOPAC.  At  present,  accurate  ab  initio  calculations  are  too  time- 
consuming  to  be  of  practical  use,  and  molecular  mechanics  methods  cannot  be  used  to  determine  quantum 
mechanical  properties  such  as  reaction-path  calculations,  transition-state  structures,  and  activation  energies. 
The  pm3  method  should  be  used  with  extreme  caution  for  examination  of  small  DNA  systems.  Future 
parameterizations  of  semiempirical  methods  should  incorporate  base  stacking  interactions  into  the  pa¬ 
rameterization  data  set  to  enhance  the  ability  of  these  methods.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Quantum  mechanical  methods  have  been  applied  to  the  study  of  purine  and 
pyrimidine  bases  and  nucleosides  [la-1]  as  well  as  to  hydrogen-bonded  dimers  of 
these  species  [2a— r].  The  molecular  structures  of  purines,  pyrimidines,  nucleosides, 
and  nucleotides  and  their  intermolecular  complexes  are  the  subject  of  several 
monographs  [3a-d].  Recently,  it  was  demonstrated  that  the  pm3  semiempirical 
quantum  mechanical  method  can  calculate  intermolecular  hydrogen  bonding  in 
small  polar  molecules  [4a-e].  In  this  article,  results  of  pm3  calculations  on 
the  nucleotides  2 -deoxyadenosine  5 '-monophosphate  (pdA),  2'-deoxyguanosine 
5 -monophosphate  (pdG),  2 -deoxycytidine  5 '-monophosphate  (pdC)  and  2'- 
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deoxythymidine  5 '-monophosphate  (pdT)  and  the  base  pairs  pdA-pdT,  pdG-pdC, 
and  pdG(sy/2)-pdC  are  presented  and  discussed.  In  addition,  the  effect  of  stacking 
DNA  bases  was  assessed  by  pm  3  calculations  of  a  dT-pdT  single  strand  and  a  dT- 
pdA  duplex.  This  research  was  performed  to  assess  the  ability  of  the  pm3  quantum 
mechanical  method  to  reproduce  the  hydrogen-bonded  structures  and  energies  of 
nucleotide  base  pairs  and  to  assess  the  ability  of  the  method  to  model  base  stacking. 

Method 

The  nucleotides  pdA,  pdG,  pdC,  and  pdT  were  built  with  Chem  3D  Plus  software 
(Cambridge  Scientific  Computing,  Inc.,  Cambridge,  MA)  on  a  Macintosh  II  com¬ 
puter  (Apple  Computer,  Inc.,  Cupertino,  CA).  The  nucleotides  were  built  in  the 
anti  orientation  about  the  glycosyl  Q '  sugar-N  base  linkage.  In  addition,  a  monomer 
of  pdG  in  the  syn  orientation  was  built.  All  nucleotide  structures  were  designed  as 
neutral  species,  by  placing  hydrogens  on  the  appropriate  phosphate  oxygens.  The 
five  monomers  were  fully  geometry-optimized  with  the  pm3  method  [5]  using 
MOPAC  5.0  and  6.0  software  [6] ,  with  the  keywords  PRECISE  and  NOMM.  Base 
pairs  were  constructed  from  optimized  monomers  with  initial  hydrogen  bond  angles 
of  180°  and  hydrogen  bond  distances  of  1.7  A  and  minimized  in  the  same  fashion. 
All  structures  were  characterized  as  stationary  points  and  true  minima  using  the 
keyword  FORCE.  Hydrogen  bond  energies  were  calculated  by  comparing  the  heats 
of  formation  of  the  base  pairs  with  the  lSCF  heats  of  formation  for  the  monomers 
when  “frozen”  in  their  base  pair-optimized  geometry.  The  “frozen”  monomers 
were  then  allowed  to  fully  relax  to  calculate  a  second  minima  for  each  nucleotide. 
In  addition,  after  examining  the  results  of  these  calculations,  several  optimized 
monomer  structures  were  perturbed  slightly,  removing  or  adding  intramolecular 
hydrogen  bonds.  These  structures  were  fully  optimized  to  assess  differences  in  local 
minima.  Calculations  were  performed  using  the  Cray  Y-MP  supercomputer  (Pitts¬ 
burgh  Supercomputer  Center)  and  a  VAX  4300  (Lake  Forest  College). 

In  addition,  several  charged  and  neutral  pdA-pdT  dimers  were  calculated  with 
MOPAC.  The  lowest  energy  charged  and  neutral  structures  were  used  as  input  for 
pm3  calculations  within  SPARTAN  (Wavefunction,  Irvine,  CA).  SPARTAN  was 
also  used  to  calculate  the  energies  and  structures  of  stacked  nucleotide  bases  and 
base  pairs.  The  starting  structure  was  obtained  from  the  middle  portion  of  an  A- 
tract  DNA  dodecamer  crystal  structure  [  7  ] ,  with  methyl  groups  placed  in  the  C5 ' 
position  to  replace  the  5'  phosphate  and  a  hydrogen  replacing  the  phosphate  attached 
to  the  03  /  of  the  sugar.  Hydrogens  were  added  to  the  crystal  structure  to  obtain  a 
reasonable  starting  structure.  All  SPARTAN  calculations  were  performed  using  an 
INDY  workstation  (Silicon  Graphics,  Mountain  View,  CA). 

Results 

The  geometry-optimized  structures  of  the  pdA-pdT,  pdG-pdC,  and  pdG(sjw)- 
pdC  neutral  base  pairs  are  presented  in  Figure  1.  All  hydrogens  within  2  A  of 
another  atom  are  illustrated  by  dashed  lines.  The  hydrogen  bonds  responsible  for 
holding  the  base  pairs  together  are  labeled  with  the  appropriate  hydrogen  bond 
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Figure  1.  PM3-calculated  structures  of  the  nucleotide  base  pairs  (A)  pdA-pdT,  (B) 
pdG-pdC,  and  (C)  pdG(sy>i)-pdC.  (•)  C;  (O)  H;  (©)  N;  (@)  O;  (®)  P. 


distance  and  bond  angle  in  the  figure.  Intermolecular  hydrogen  bond  lengths  range 
from  1.78  to  1.84  A  and  hydrogen  bond  angles  are  between  170  and  178°  for  the 
three  nucleotide  base  pairs.  Torsion  angles  most  relevant  to  nucleotide  structure 
[3c]  and  sugar-puckering  values  are  reported  in  Table  I.  In  addition,  the  top  third 
of  the  table  reports  the  minimized  heats  of  formation  for  each  optimized  monomer, 
the  middle  third  lists  the  heats  of  formation  for  each  monomer  when  frozen  in  the 
dimer  configuration  (the  lSCF  calculations),  and  the  bottom  of  the  table  contains 
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Table  I.  Torsion  angles,  sugar-puckering  values,  and  heats  of  formation  for  pm3  calculated  nucleotide 

dimers  and  monomers. 


Monomers  (optimized)3 


Parameter 

pdA 

pdT 

pdG 

pdC 

pdGsyrc 

X  (04'— Cl'—Nl— C2) 

- 

-122.8 

-128.9 

(04' — C 1 ' — N9 — C4) 

-133.2 

-126.6 

55.8 

7  (05' — C5' — C4' — €3') 

36.6 

44.7 

42.5 

48.4 

-61.7 

P  (P— 05'— C5'— C4') 

-132.4 

-113.6 

-112.0 

-115.3 

169.2 

P 

-24.0 

43.3 

63.5 

64.0 

64.6 

vm 

22.4 

-7.15 

-12.09 

-5.02 

-8.85 

A Hf 

-276.0 

-408.06 

-323.96 

-344.67 

-326.58 

Dimers 

Parameter 

pdA  pdT 

pdG 

pdC 

pdGyy«  pdC  (in  G syn  bp) 

X  (04'— Cl'—Nl—  C2) 

-122.0 

-132.6 

-125.4 

(04'— Cl'— N9— C4) 

-134.3 

-137.7 

62.6 

7  (05'— C5'— C4'— C3') 

49.2  47.7 

47.8 

50.5 

-66.3 

53.0 

P  (P— 05'— C5'— C4') 

-160.3  -113.9 

-112.4 

-101.6 

166.0 

-156.3 

P 

-19.6  48.3 

b 

60.9 

55.3 

48.3 

v 

Y  max 

15.92  -6.01 

b 

-5.35 

-12.30 

-3.96 

AHj(  1  scf)c 

-277.5  -407.2 

-322.0 

-342.2 

-325.0 

-343.6 

Monomers  (fully  relaxed  from  dimer  conformation) 


X  (04'— Cl'—Nl—  C2) 

-122.6 

-135.2 

-131.9 

(04' — C 1 ' — N9 — C4) 

-134.7 

-137.6 

62.5 

7  (05'— C5'— C4'— C3') 

48.5 

46.2 

47.0 

52.0 

-66.2 

54.4 

P  (P— 05'— C5'— C4') 

-157.7 

-113.9 

-112.5 

-101.3 

165.8 

-157.4 

P 

-26.4 

53.5 

-60.1 

69.6 

54.3 

73.1 

V 

y  max 

13.17 

-5.88 

9.23 

-4.89 

-11.82 

-2.75 

AHf 

-278.0 

-408.07 

-324.2 

-343.6 

-326.7 

-346.3 

3  The  same  initial  pdC  was  used  to  build  the  pdG-pdC  and  pdG(sy«)-pdC  dimers. 
b  v2  =  0;  P  is  undefined. 

c  Heat  of  formation  for  the  monomer  when  “frozen”  in  the  dimer  conformation. 


the  heats  of  formation  for  each  monomer  after  full  minimization  starting  from  the 
dimer  configuration. 

The  PM3  heats  of  formation  for  the  pdA-pdT,  pdG-pdC,  and  pdG(sjw)-pdC 
base  pairs  are  —691.42,  -679.79,  and  —683.56  kcal  mol-1,  respectively.  The  heats 
of  association,  i.e.,  the  difference  between  the  base-pair  heats  of  formation  and  the 
sum  of  the  individual  nucleotides  “frozen”  in  the  base-pair  minima  structure,  are 
—6.72,  —15.59,  and  — 14.96  kcal  mol-1  for  the  pdA-pdT,  pdG-pdC,  and  pdG(^y«)- 
pdC  base  pairs,  respectively.  Dividing  the  heats  of  association  by  the  number  of 
hydrogen  bonds  allows  assignment  of —3.36  kcal  mol-1  per  hydrogen  bond  between 
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pdA  and  pdT,  -5.20  kcal  moP1  per  hydrogen  bond  between  pdG  and  pdC,  and 
-4.99  kcal  mol-1  per  hydrogen  bond  between  pdG(s>w)  and  pdC. 

Four  pdA  local  minima  were  found,  separated  by  2.7  kcal  mol-1.  Four  pdC 
minima  were  located,  all  within  3  kcal  mol-1  of  each  other.  For  pdG,  the  pdG(s>72) 
monomer  was  2.5  kcal  mol-1  lower  in  energy  than  the  lowest  anti  pdG  molecule, 
which,  in  turn,  was  0.25  kcal  mol-1  lower  in  energy  than  any  other  local  minima. 
No  significant  structural  differences  were  found  in  these  low  lying  minima  of  each 
individual  nucleotide,  except  that  one  pdA  and  one  pdC  nucleotide  lacked  an  in¬ 
tramolecular  hydrogen  bond. 

Results  of  pm3  calculations  on  the  stacked  dT-pdA  duplex,  with  an  overall  charge 
of  —2,  are  displayed  in  Figure  2  as  a  function  of  gradient  optimization  termination 
conditions.  The  top  structure  in  the  figure  comes  from  a  DNA  dodecamer  crystal 
structure,  with  hydrogens  added  within  SPARTANS  builder  module  [7].  This 
structure  was  the  input  for  pm3  calculations  within  SPARTAN.  The  middle  structure 
results  from  geometry  optimization  with  the  gradient  tolerance  set  at  0.00 1  au  and 
the  energy  gradient  set  to  0.05  kcal  mol-1.  The  bottom  structure  results  from  ge¬ 
ometry  optimization  under  the  conditions  of  0.00005  au  gradient  tolerance  and 
0.001  kcal  mol-1  energy  tolerance.  Table  II  contains  torsion  angles,  sugar-puckering 
values,  and  heats  of  formation  for  the  PM3-calculated  dT-dpA  duplex.  Table  III 
compares  the  sugar  puckering  for  the  pdA-pdT  dimer,  the  input  dT-pdA  duplex 
from  the  crystal  structure,  and  the  dT-pdA  duplex  obtained  from  pm  3  geometry 
optimization  (B  in  Fig.  2).  Figure  3  illustrates  how  pm3  unstacks  one  of  the  dT- 
dpT  strands  in  the  absence  of  hydrogen  bonding. 

Discussion 


Individual  Base  Pairs 

Intermolecular  Hydrogen  Bonds.  The  hydrogen  bond  distances  between  the  pdA- 
pdT,  pdG-pdC,  and  pdG(sy/?)-pdC  base  pairs  are  labeled  in  Figure  1.  Comparing 
the  distances  between  the  donor  and  acceptor  atoms  for  the  pdA-pdT  base  pair, 
the  pm3  calculated  values  of  2.827  A  for  the  A:N6-T:04  acceptor/ donor  and  2.819 
A  for  the  T :N3- A :N!  acceptor /donor  atoms  are  slightly  less  than  the  experimental 
values  of 2.950  and  2.820  A  from  X-ray  crystal  structures  [3c] .  Leach  and  Kollman 
recently  calculated  intermolecular  hydrogen  bond  distances  and  bond  angles  for 
the  guanine-cytosine  and  adenine-thymine  base  pairs  with  the  pm3  method  [2r]. 
Their  calculated  values  of  2.83  and  2.82  A  and  175.4  and  175.8°  for  the  N6  04 
and  N3  —  Nj  acceptor/ donor  pairs  differ  from  ours  only  slightly  with  respect  to 
hydrogen-bond  angles.  This  illustrates  that  the  PM3  method  does  not  treat  hydrogen 
bonding  of  bases  differently  from  hydrogen  bonding  of  nucleosides  or  nucleotides. 
The  pm3  hydrogen-bond  energy  of  -3.36  kcal  mol-1  is  lower  than  the  gas-phase 
hydrogen-bond  energy  of -6.5  kcal  mol-1  determined  by  temperature-dependent 
field  ionization  mass  spectrometry  (TD-FIMS)  [8] .  It  was  previously  shown  that  the 
pm 3  method  underestimates  hydrogen-bond  energies  by  as  much  as  several  kcal 
mol-1  and  underestimates  hydrogen-bond  distances  by  0.1 -0.2  A  for  small  polar 
molecules  [4d].  Leach  and  Kollman  reported  lower  pm3  association  energies  for 
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Figure  2.  (A)  Input  geometry  of  a  dT-pdA  duplex  taken  from  reference  seven.  (B) 
pm3  geometry  after  calculations  in  SPARTAN  with  an  rms  G  =  0.0003  and  energy 
changes  of  less  than  0.05  kcal  mol-1.  (C)  PM3  geometry  after  further  optimization  in 
SPARTAN  that  reduced  rmsG  to  less  than  0.0000  with  energy  changing  by  less  than 
0.002  kcal  mol'1.  (•)  C;  (O)  H;  (©)  N;  (@)  O;  (©)  P. 


base  pairing,  a  consequence  of  their  method  of  subtracting  the  sum  of  the  individual 
fully  minimized  bases  from  that  of  the  minimized  base  pair  [2r]. 

For  the  pdG-pdC  base  pair,  the  PM3  donor/ acceptor  distances  of  2.81 1,  2.806, 
and  2.844  A  are  slightly  shorter  than  the  experimental  values  [3c]  of  2.910,  2.850, 
and  2.860  A  for  the  C:N4-G:06,  G:Ni-C:N3,  and  G:N2-C:02  donor/ acceptor  dis¬ 
tances,  respectively.  Compared  to  previous  Pm3  results  [2r]  on  the  base  pair  G-C, 
only  the  hydrogen  bond  angles  differ  by  2-5°  for  the  nucleotide  base  pairs.  The 
experimental  TD-FIMS  pdG-pdC  gas-phase  hydrogen  bond  energy  is  —7  kcal  mol"1 


HYDROGEN  BONDING  OF  NUCLEOTIDE  BASE  PAIRS 


101 


Table  II.  Torsion  angles  and  sugar-puckering  values  for  the  dT-pdA  duplex  calculated  with  the  pm3 

Hamiltonian  within  SPARTAN. 


Parameter 

dT 

pdA 

dT 

pdA 

Input 

X  (04'— Cl'— Nl— C2) 

-97.8 

-98.0 

(04' — C 1 ' — N9 — C4) 

-97.8 

-98.0 

7  (05— C5' — C4' — C3') 

36.4 

36.6 

36.3 

36.3 

0  (P— 05'— C5'— C4') 

-146.0 

-146.0 

-146.0 

-146.0 

P 

11.78 

11.75 

11.73 

11.83 

V 

’  max 

-35.65 

-35.75 

-35.64 

-35.66 

Output 

X  (04'— Cl'— Nl— C2) 

-94.3 

-96.3 

O 

4^ 

r 

n 

f 

z 

o 

n 

-112.4 

-112.0 

7  (05'— C5'— C4'— C3') 

a 

32.3 

a 

33.0 

0  (P— 05'— C5'— C4') 

a 

-152.4 

a 

-153.2 

P 

-4.19 

19.1 

-11.64 

12.2 

v 

r  max 

-27.6 

-20.1 

-27.1 

-18.3 

a  Unable  to  do  because  phosphate  group  was  removed. 


[8]  within  2  kcal  mol-1  of  the  pm3  calculated  value  of-5.20  kcal  mol-1 .  The  pm3 
donor /acceptor  distances  for  the  pdG(s>72)-pdC  base  pair  are  similar  to  the  values 
for  the  pdG-pdC  base  pair,  2.807,  2.816,  and  2.845  A  for  the  C:N4“G:06,  G:Ni- 
C:N3,  and  G:N2-C:02  atom  pairs,  respectively.  The  pm3  hydrogen  bond  energy  is 
-4.99  kcal  mol-1 . 


Table  III.  Pseudorotation  angles  of  the  sugars. 


Base 

VO  VI 

V2 

V3 

V4 

P 

Fmax 

Dimer  output  conformation 

A 

10.2  -15.9 

15.0 

-9.7 

0.0 

-19.6 

15.9 

T 

3.0  0.9 

-4.0 

6.0 

-5.7 

48.3 

6.0 

Stacked  base  pairs — input 

pdT 

-4.2  24.9 

-34.9 

33.2 

-18.3 

11.78 

-35.65 

pdA 

-4.3  25.0 

-35.0 

33.3 

-18.4 

11.75 

-35.75 

pdT 

-4.3  24.9 

-34.9 

33.2 

-18.3 

11.73 

-35.64 

pdA 

-4.2  24.9 

-34.9 

33.3 

-18.3 

11.83 

-35.66 

Stacked  base  pairs — output 

dT 

-11.0  25.0 

-27.5 

22.3 

-7.5 

-4.19 

-27.57 

pdA 

0  12.4 

-19.0 

19.8 

-12.8 

19.06 

-20.10 

dT 

-14.0  26.0 

-26.5 

19.4 

-3.8 

-11.64 

-27.06 

pdA 

-2.2  13.0 

-17.9 

17.4 

-9.7 

12.19 

-18.31 

fan  P  =  [(V4  +  VI)  -  (V3  +  V0)j/[2  X  V2  X  (sin  36  +  sin  72)].  Kmax  =  V2/cos  P. 
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Intramolecular  Hydrogen  Bonds.  The  nucleotides  that  compose  the  base  pairs 
in  Figure  1(A)  and  (B)  are  in  the  anti  position  about  the  glycosyl  Q  /  —  N torsion 
angle  and  in  the  +sc  orientation  about  the  C4' —  C$'  bond.  This  conformation  is 

stabilized  by  the  C — H - 05  /  hydrogen  bonds  of  lengths  1 .830,  1.851,  1 .837, 

and  1.836  A  for  pdA,  pdT,  pdG,  and  pdC,  respectively.  The  hydrogen  bond  angles 
for  these  bonds  are  164.0,  173.8,  159.9,  and  173.9°,  respectively.  The  pm3  calcu¬ 
lations  are  consistent  with  X-ray  crystal  structures  and  deuterium-exchange  nmr 
experiments,  which  show  that  the  C8  purine  and  C6  pyrimidine  protons  are  con¬ 
siderably  acidic  and  participate  in  hydrogen  bonding  [3c].  The  pm3  anti  base  pairs 

also  have  C3' —  H - Op  intramolecular  hydrogen  bond  lengths  of  1.874, 

1.871,  and  1.870  A  and  angles  of  165.8,  166.6,  and  165.9°  for  the  pdT,  pdG,  and 
pdC  nucleotides,  respectively.  The  main  structural  difference  between  the  optimized 
dimers  is  that  for  the  nucleotides  in  the  anti  orientation  the  pdA  that  is  half  of  the 
pdA-pdT  dimer  and  the  pdC  that  is  half  of  the  pdG(sy«)-pdC  dimer  do  not  have 

the  C3'  —  H - 0P  intramolecular  hydrogen  bond.  This  is  reflected  by  the  /3 

torsion  angle,  which  is  close  to  -110°  for  the  anti  nucleotides  that  have  the 

C3'  —  H - 0P  intramolecular  bond,  but  changes  to  approximately  -160° 

for  the  pdA  and  pdC  nucleotides  that  do  not  have  this  close  contact. 

For  the  pdG(sjw)-pdC  base  pair  displayed  in  Figure  1(C),  the  anti,  +sc,  con- 
former  of  pdC  is  stabilized  by  the  C6  —  05/  hydrogen  bond  1.829  A  long.  The 
pdG(sy«)  nucleotide  has  apparent  hydrogen  bonds  of  1.888  A,  154.5°  and  1.815 
A,  169.0°  for  the  C5'  —  H - — N3  and  N2  —  H - Op  parameters,  re¬ 

spectively.  In  solution  and  in  crystal  structures,  pdG  nucleotides  prefer  the  syn 
form,  and  previous  calculations  have  implicated  van  der  Waals  and  electrostatic 
attractions  between  the  amino  group  in  the  2  position  and  the  5'  phosphate  [3c]. 

Dimer  Bond  Distances  and  Bond  Angles.  The  pm3  calculated  bond  distances 
and  bond  angles  for  nucleotides  involved  in  intermolecular  base  pairing  compare 
well  with  averaged  data  compiled  from  X-ray  crystallographic  studies  [3c].  The 
standard  deviation  is  0.029  A  for  bond  lengths  and  2.1°  for  bond  angles  for  the 
nonhydrogen  atoms  in  the  three  investigated  base  pairs.  As  shown  in  Table  I,  the 
torsion  angle  about  the  glycosyl  bond,  X,  is  anti  ( -ac )  for  all  but  the  pdG(sy«,  sc) 
nucleotide.  Rotation  about  the  exocyclic  C4 '  —  C5/  bond  allows  05  /  to  assume  three 
main  conformations  relative  to  the  furanose  (7  =  +sc,ap,  or  -sc).  The  three  ranges 
are  not  uniformly  populated,  and  for  nucleotides,  torsion  angles  7  and  X  fall  in  the 
+sc  and  anti  ranges,  whereas  for  the  pdG(s>w)  nucleotide,  7  and  X  are  in  the  ap 
and  syn  ranges  [3c].  The  pm3  results,  outlined  in  Table  I,  show  that  all  of  the  anti 
nucleotides  are  indeed  +sc  with  respect  to  the  torsion  angle  7,  whereas  the 

C5'  —  H - N3  hydrogen  bond  in  the  pdG(sjw)  nucleotide  results  in  a  torsion 

angle  of —66.3°  for  7,  falling  in  the  —sc  range. 

Sugar-puckering  modes  of  the  furanose  ring  are  described  by  the  pseudorotation 
cycle.  In  nucleotide  structures,  two  ranges  of  pseudorotation  angles  are  preferred: 
Cy-endo  at-18°<P<36°  (North)  and  C2'-endo  at  130°  <  P  <  200°  (South) 
[9] .  Although  puckering  at  the  C3'  and  C2>  atoms  is  most  commonly  observed,  the 
furanose  ring  is  disordered  in  some  crystal  structures  [  10] .  Interconversion  between 
Cy-endo  and  C2'-endo  puckering  modes  is  extremely  rapid,  requiring  approximately 
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5  kcal  mol-1  activation  energy  [3c].  The  sugar  puckers  calculated  by  pm3  are  on 
the  North  side  of  the  pseudorotation  cycle,  with  Cy-exo  being  the  major  puckering 
mode  for  all  but  the  pdA  nucleotide.  For  pdA,  the  furanose  is  twisted  with  major 
C 2'-exo  and  minor  Ci  '-endo  puckering.  A  recent  comparison  of  different  compu¬ 
tational  methods  for  the  conformational  analysis  of  ring  systems  shows  that  amI 
and  PM3  calculations  optimized  dideoxyribose  to  a  near  planar  configuration  [10]. 
Our  results  with  vmax  values  no  higher  than  16°  for  pdA  in  the  dpA-pdT  dimer 
(Table  I)  shows  the  inability  of  pm3  to  accurately  model  the  furanose  ring  in 
nucleotides. 

Crystal  structures  of  nucleotides  show  that  the  torsion  angle  f$9  which  defines 
rotation  about  the  C5  —  05/  bond,  is  limited  mainly  to  the  ap  range  with  some 
structures  in  the  ac  range  [3c].  The  nucleotides  pdA,  pdG(sp«),  and  pdC  (from 
the  pdG  syn  dimer)  are  in  the  normal  ap  range,  whereas  pdT,  pdG,  and  pdC  are 

in  the  ac  range.  Only  the  structures  in  the  ac  range  form  the  C3'  —  H - Op 

intramolecular  hydrogen  bond. 

Monomer  Bond  Distances  and  Bond  Angles .  The  lowest-energy  pm3  monomer 
nucleotide  structures  have  bond  lengths  and  bond  angles  consistent  with  averaged 
crystallographic  data  [3c].  The  standard  deviations  are  0.25  A  for  bond  lengths 
and  2.1°  for  bond  angles.  All  the  intramolecular  hydrogen  bonds  found  in  the 
dimer  calculations  also  exist  in  the  minimized  monomers.  Table  I  reveals  that 
rotations  about  the  glycosyl  bond  ( X ) ,  the  C4 '  —  C5'  bond  ( y ) ,  the  C5  / — 05'  bond 
(/?),  and  preferred  sugar  puckerings  for  the  monomers  deviate  only  slightly  from 
the  dimer  structures.  Nucleotides  are  flexible  molecules,  and  the  pm3  calculations 
illustrate  that  many  local  minima  are  possible.  Slight  adjustments  to  the  models 
followed  by  full  minimization  produced  a  pdA  molecule  with  the 

C3'  —  H - Op  intramolecular  interaction  and  a  pdC  molecule  without  the 

Cy  —  H - 0P  close  contact.  Our  lowest-energy  structure  for  the  pdA  nu¬ 

cleotide  monomer  does  not  show  this  close  contact,  but  another  local  minima  with 
the  intramolecular  bond  is  2.1  kcal  mol-1  higher  in  energy. 

SPARTAN  Base  Pairs.  The  implementation  of  pm3  within  SPARTAN  differs 
from  that  in  MOPAC.  Standard  tolerances  for  termination  of  geometry  optimization 
are  much  tighter  in  SPARTAN  than  in  MOPAC.  In  addition,  there  is  an  error  in 
the  calculation  of  the  hydrogen  bond  energy  for  charged  nucleotide  base  pairs  in 
MOPAC  that  is  partly  rectified  in  SPARTAN.  Optimization  of  the  anionic  pdA- 
pdT  base  pair  (charge  =  —2)  in  MOPAC,  followed  by  the  lSCF  calculations  on  the 
individual  nucleotides  (each  with  a  charge  of — 1 ),  gives  positive  rather  than  negative 
association  energies.  Positive  association  energies  on  the  order  of  hundreds  of  kcals 
mol-1  with  MOPAC  are  reduced  to  approximately  20  kcal  mol”1  with  SPARTAN. 
The  problem  is  not  completely  corrected  in  SPARTAN,  but  the  error  is  certainly 
reduced. 

Comparison  of  the  charged  and  uncharged  pdA-pdT  dimers  calculated  with  the 
pm  3  Hamiltonian  within  SPARTAN  reveal  that  the  overall  structure  remains  fairly 
similar  between  the  charged  and  uncharged  species.  The  only  angle  that  changes 
in  a  significant  way  is  for  the  rotation  about  the  C5'  —  05'  bond  ((3).  The  value  for 
f3  changes  from  —161°  to  —128°  for  pdA  and  from  —105°  to  —114°  for  pdT.  This 
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change  is  easily  accommodated  as  the  phosphate  group  is  very  flexible.  All  the 
intramolecular  hydrogen  bonds  present  in  the  neutral  dimer  remain  in  the  doubly 
charged  dimer. 

Comparison  of  the  SPARTAN  neutral  pdA-pdT  dimer  with  the  MOPAC  dimer 
reveals  that  the  two  minima  are  virtually  identical.  The  SPARTAN  values  for  X, 
7,  ft  VO,  VI,  V2,  V3,  and  V4  of -134.6°,  47.3°,  -160.8°,  10.3°,  -14.8°,  13.3°, 
-7.9°,  and  -1.4°,  respectively,  for  pdA  are  within  2°  of  those  for  the  MOPAC 
dimer  reported  in  Tables  I  and  III.  Similarly,  the  SPARTAN  values  for  X,  7,  ft 
VO,  VI,  V2,  V3,  and  V4  of-122.8°,  46.0°,  -1 13.8°,  3.5°,  0.2°,  -3.3°,  5.5°,  and 
-5.7°,  respectively,  for  pdT  are  very  close  to  those  values  reported  for  the  MOPAC 
dimer.  The  heat  of  formation  for  the  SPARTAN  pdA-pdT  uncharged  dimer  is 
lowered  by  0.07  kcal  mol"1  from  the  MOPAC  calculation  (with  PRECISE). 

Stacked  Base  Pair 

Figure  2  and  Table  II  contain  the  results  of  pm 3  calculations  on  the  stacked  dT- 
pdA  system.  Table  II  analyzes  structure  B  in  Figure  2.  Table  II  reveals  that  the 
torsion  angles  and  overall  sugar-puckering  values  are  very  similar  for  each  nucleotide 
in  the  input  structure  A  obtained  from  X-ray  crystallography  [7].  Comparing  the 
input  with  the  pm  3  output  shows  that  the  crystal  fragment  relaxes  in  this  gas-phase 
calculation,  with  the  biggest  change  in  X  for  pdA.  The  15°  change  in  X  rotates  the 
adenine  further  away  from  the  plane  of  the  sugar,  still  well  within  the  anti  range. 
Table  III  shows  how  stacking  DNA  bases  influences  the  sugar  puckering  within 
pm3.  The  input  structure  for  the  stacked  dT-pdA  duplex  is  quite  puckered,  whereas 
the  output  from  the  PM 3  calculation  is  less  puckered  but  still  much  more  puckered 
than  is  the  pdA-pdT  dimer.  The  phosphate  groups  in  DNA  molecules  lose  much 
of  the  flexibility  observed  in  a  single  base  pair,  and  the  intramolecular  hydrogen 
bonds  observed  between  the  phosphate  oxygens  and  the  sugar  C — H  disappear  in 
the  dT-pdA  duplex.  The  constraints  of  the  phosphate  group  remove  the  intra¬ 
molecular  hydrogen  bonds,  and  as  a  result,  the  sugar  puckering  is  reproduced  much 
better  in  the  duplex  structure. 

Although  the  pm3  method  reproduces  intermolecular  hydrogen  bonds  very 
well  for  a  parameterized  method,  it  is  not  as  successful  for  the  stacking  inter¬ 
actions  of  DNA  bases.  The  driving  force  for  DNA  helix  formation  is  believed 
to  come  from  stacking  interactions  of  DNA  bases  rather  than  from  hydrogen 
bonding  between  the  base  pairs  [3c] .  This  is  because  in  solution  DNA  bases  are 
already  hydrogen-bonded  with  water  molecules,  and  pairing  of  the  bases  can 
only  occur  after  breaking  base-water  hydrogen  bonds.  Structures  B  and  C  in 
Figure  2  show  that  geometry  optimization  is  driving  the  bases  apart.  The  heat 
of  formation  for  B  is  -824  kcal  mol"1,  the  rmsG  =  0.0003,  and  the  energy  is 
changing  by  less  than  0.05  kcal  mol"1 .  In  C,  the  heat  of  formation  is  -830.758 
kcal  mol"1,  the  rmsG  is  less  than  0.0000,  and  the  energy  is  changing  by  less 
than  0.002  kcal  mol-1.  The  hydrogen  bonds  between  the  base  pairs  are  a  sta¬ 
bilizing  influence  on  the  stacking  disruption.  Figure  3  shows  that  removing  the 
hydrogen  bonds  allows  the  bases  to  quickly  unfold  (heat  of  formation  =  -554.84 1 
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Figure  3.  pm3  geometry-optimized  structure  of  a  dT-pdT  single  strand.  (•)  C;  (O)  H; 
(©)  N;  (@)0;  (€)P. 


after  converging  within  the  default  gradient  tolerances).  The  PM3-optimized  dT- 

pdT  structure  unstacked  the  thymine  bases  and  formed  the  C — H - (V 

intramolecular  hydrogen  bond.  This  intramolecular  interaction  was  not  present  in 
the  input  geometry.  These  calculations  suggest  that  if  a  DNA  helix  of  many  base 
pairs  were  geometry-optimized  by  the  pm 3  method  that  the  middle  bases  would 
hold  together  fairly  well,  but  that  the  terminal  bases  would  be  subject  to  unstacking 
at  high  gradient  tolerances. 


Conclusions 

The  pm3  semiempirical  quantum  mechanical  method  is  capable  of  reproducing 
the  structures  of  nucleotides  and  nucleotide  base  pairs.  Nucleotide  bond  dis¬ 
tances,  bond  angles,  torsion  angles  about  the  glycosyl  bond  (X),  torsion  angles 
about  the  C4 '  —  C5>  bond  ( y ) ,  and  torsion  angles  about  the  C5 '  05'  bond  ( £ ) 

agree  with  experimental  results.  Hydrogen  bond  energies  are  underestimated  by 
2-3  kcal  mol"1.  Intermolecular  hydrogen  bond  lengths  between  nucleotides 
displaying  Watson-Crick  base  pairing  are  0. 1-0.2  A  less  than  experimental  re¬ 
sults.  Sugar  puckering  is  not  reproduced  by  the  PM 3  method  for  nucleotide  base 
pairs,  although  results  do  fall  on  the  north  side  of  the  pseudorotation  cycle. 
There  are  many  possible  conformations  of  nucleotides.  pm3  calculations  reveal 
that  many  of  the  most  stable  conformations  are  stabilized  by  intramolecular 

C  —  H - —  O  hydrogen  bonds.  Stacking  of  bases  constrains  the  phosphate 

connecting  the  sugars  in  DNA,  removing  the  intramolecular  C  H  -  O 
hydrogen  bonds  between  the  sugar  and  the  phosphate,  with  the  result  that  sugar 
puckering  is  now  modeled  quite  well.  pm3  stacking  interactions  are  repulsive, 
rather  than  attractive,  with  the  result  that  more  stringent  gradient  tolerances 
tend  to  slowly  unstack  the  dT-dpA  duplex.  It  is  quite  likely  that  pm3  calculations 
on  a  bigger  helix  with  less  stringent  gradient  tolerances  than  are  the  norm  in 
SPARTAN  would  adequately  model  the  center  portion  of  the  helix.  Future  pa- 
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rameterizations  of  semiempirical  methods  should  include  base-stacking  inter¬ 
actions  in  the  parameterization  set.  Until  that  time,  PM3  should  be  used  with 
caution  for  modeling  DNA  chemistry. 
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Abstract 

The  study  of  water  in  macromoiecular  crystals  is  approached  with  a  restrained  molecular  dynamics 
method  that  makes  use  of  X-ray  diffraction  data,  without  the  need  of  thermal  B  factors  for  the  solvent. 
This  method,  called  here  solute-grid-restrained  molecular  dynamics  (sgrmd),  is  applied  to  a  test  case 
of  a  simulated  crystal  of  erythrol.  The  results  are  quite  satisfactory,  and  it  is  concluded  that  the  method 
can  be  useful  to  study  real  macromoiecular  crystals.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

The  problem  of  obtaining  the  contribution  of  the  solvent  to  X-ray  diffraction 
intensities  remains  a  major  task.  For  the  case  of  ordered  water  molecules  visible  in 
a  difference  map,  it  is  highly  labor-intensive  and  dependent  on  a  possible  “human 
factor”  in  the  interpretation  of  density  peaks  as  water  molecules.  For  the  case  of 
the  contribution  of  disordered  solvent,  we  can  cite  the  work  of  Blake  et  al.  [1  ] ,  who 
used  a  uniform  solvent  density;  that  of  Cheng  and  Schoenborn  [2],  who  modeled 
the  disordered  solvent  with  closely  spaced  pseudoatoms;  and  that  of  Badger  and 
Caspar  [3],  who  used  an  iterative  density  modification  procedure  to  obtain  a  “sol¬ 
vent  density.” 

In  this  work,  we  proposed  to  obtain  simultaneously  a  complete  solvent  model, 
both  for  ordered  and  disordered  molecules,  as  well  as  the  corresponding  density. 
To  do  so,  we  devised  an  “experimentally  biased”  molecular  dynamics  simulation 
of  a  fully  solvated  crystal.  The  X-ray  experimental  data  act  as  a  perturbation  on 
the  modelization  forces,  in  a  way  different  from  the  one  used  in  simulated  annealing. 
Instead  of  calculating  a  contribution  based  on  atomic  positions,  which  require  the 
knowledge  of  B-factors,  an  intermediate  occupancy  grid  is  used  both  for  accumu- 
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lating  water  trajectories  and  for  introducing  forces  dependent  on  electron  density 
difference  maps  calculated  from  X-ray  amplitudes. 

Algorithm 

All  code  was  written  based  on  the  existing  GROMOS  MDX  package  (BIOMOS); 
permission  for  modification  was  granted  by  the  owners  (Profs.  H.  J.  C.  Berendsen 
and  W.  F.  van  Gunsteren) .  All  additional  routines  were  also  written  in  FORTRAN. 
These  additional  routines  concern  the  calculation  of  the  solvent  occupancy  grid 
and  of  the  perturbation  force. 

The  Occupancy  Grid 

The  method  is  based  on  the  construction  of  a  fine  grid  covering  the  solvent  region 
of  the  simulation  box.  Each  grid  point  will  have  associated  a  water  occupancy.  The 
first  step  consists  of  filling  a  box — of  the  same  dimensions  of  the  crystal  unit  cell — 
with  the  solute  molecule,  which  has  been  already  refined  by  standard  methods,  and 
randomly  distributed  water  molecules.  This  system  is  run  through  dynamic  steps, 
imposing  position  restraints  on  the  solute  and  letting  the  water  move  following  the 
prescriptions  for  molecular  dynamics  simulation.  Water  trajectories  are  kept  and 
used  to  calculate  the  initial  occupancy  grid  (Fig.  1,  step  1).  To  measure  the  in¬ 
stantaneous  occupancy,  the  solvent  volume  is  divided  in  pixels  (smaller  than  a 
water  molecule)  centered  at  grid  points.  Each  pixel  is  considered  to  be  occupied 


Figure  I .  Flowchart  of  the  procedure. 
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by  a  water  molecule  if  the  center  of  the  molecule  lies  inside  it.  This  water  molecule 
also  contributes  to  neighboring  pixels,  the  contribution  diminishing  with  the  dis¬ 
tance.  The  instantaneous  occupancy  of  the  pixel  is  then  associated  to  the  grid  point. 
The  mean  occupancy  of  a  particular  grid  point  at  step  n  [ Oc(n )]  is  defined  as  a 
weighted  average  of  the  instantaneous  occupancies  ( Ocinst ).  Since  the  system  is 
evolving,  old  configurations  must  be  gradually  discarded  and,  also,  the  instantaneous 
occupancy  has  to  be  gradually  introduced.  To  accomplish  this  without  having  to 
store  old  configurations,  the  occupancy  in  the  nth  step  is  computed  as 

Oc{n)  =  lOc(n  -  1)(L  -  Q)  +  OcinstQ]/L.  (1) 

The  ratio  Q/L  gives  the  weight  applied  to  the  incoming  configuration.  In  our 
tests,  Q  =  1  and  L  =  100. 

The  Perturbation  Force 

The  simulation  proceeds  starting  from  the  previously  calculated  grid,  applying 
to  each  water  molecule  an  extra  force,  as  described  in  what  follows.  Mean  water 
occupancies  at  grid  points  yield  an  absolute  solvent  electron  density,  considering 
that  a  fully  occupied  site  corresponds  to  the  scattering  of  a  water  molecule  (oxygen 
atom).  As  indicated  before,  the  occupancy  of  a  grid  point  for  a  molecule  gives  a 
contribution  of  the  electron  density  in  an  environment  that  cover  a  number  of  grid 
points.  The  electron  density  is  computed  as 

l+p  j+p  k+p 

p(i,j,k)  oc  2  2  2  W(In)Oc(l,m,n),  (2) 

l=i—p  m=j—p  n=k—p 

where  p  is  the  maximum  neighboring  grid  point  that  contributes  to  electron  density 
and  W{  Index )  is  the  weight  of  the  contribution  of  the  In  neighbor  level. 

This  solvent  electron  density  (Fig.  1,  step  2)  contributes  then  to  the  calculated 
structure  factor  Fcal .  From  the  total  calculated  structure  factor  (the  contribution 
of  the  solute  molecule  and  the  solvent  grid)  and  the  experimentally  observed  struc¬ 
tural  factor  Fohs,  a  difference  density  map, 

A p  =  3 { || Fobs\  -  |  Fcalc ||  exp(i<f>caic)} ,  (3) 

is  obtained  (Fig.  1,  step  3).  A  positive  value  for  the  density  difference  in  a  point 
will  indicate  the  lack  of  water  molecules  in  such  a  point,  and  a  negative  value,  an 
excess.  This  density  difference  map  can  be  used  to  obtain  a  map  for  each  grid  point 
that  will  generate  a  force  centered  in  the  grid  point,  attracting  or  repelling  water 
molecules,  according  with  the  needs  to  agree  with  the  experimental  data.  This  force 
is  scaled  to  be  of  the  same  order  of  the  modelization  forces.  This  scale  can  be  varied 
in  order  to  weight  more  or  less  the  crystallographic  contribution.  The  difference 
density  map  is  not  an  instantaneous  picture  of  the  solvent  distribution  but  an 
average  over  a  number  of  solvent  configurations.  It  is  updated  fast  enough  to  allow 
a  convenient  advance  and  slowly  enough  to  avoid  fast  fluctuations.  All  the  dynamics 
now  evolve  having  the  two  contributions:  the  modeled  forces  and  the  grid-mediated 
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forces  (Fig.  1,  step  4).  We  can  call  the  method  the  solute-grid-restrained  molecular 
dynamics  (SGRMD). 


Testing 

The  method  has  been  tested  on  pseudoexperimental  data  for  erythrol  with  three 
types  of  hydration  water.  Five  molecules  “fixed”  (with  a  thermal  factor  2?  =  4), 
five  in  the  “second  hydration  shell”  (B  =  16),  and  16  as  a  “bulk  water”  (B  =  80). 
The  water  model  corresponds  to  the  SPC/E  model  [4]  and  erythrol  has  been  modeled 
and  simulated  already  [5],  Figure  2  shows  the  radial  distribution  of  water  molecules 
around  the  oxygen  atoms  during  the  last  10  ps  of  this  simulation. 

The  test  system  has  the  advantage  of  being  small — programs  during  the  final 
debugging  run  fast — and  that  we  know  the  results  of  the  simulation,  i.e.,  the  behavior 
of  the  model.  It  has  the  disadvantage  of  being  a  rather  artificial  crystal.  Some  water 
molecules  selected  in  the  “first  hydration  shell”  are  known  to  be  on  sites  of  short 
lifetime  due  to  the  proximity  of  a  hydrophobic  corner  of  erythrol.  In  all  these  tests, 
the  value  of  QjL  was  0.01,  i.e.,  at  every  point,  the  occupancy  is  the  average  of  100 
molecular  dynamics  steps. 


Results 

The  original  occupancy  grid  was  obtained  by  averaging  the  solvent  configurations 
from  a  molecular  dynamics  run  of  600  steps  of  0.002  ps  without  X-ray  constraints. 


Figure  2.  Radial  distribution  of  water  molecules  around  the  oxygen  atoms  during  the 
last  10  ps  of  the  simulation  of  erythrol  in  water.  Note  the  peaks  at  0.3  nm  (3)  around  02 
and  03,  showing  the  existence  of  an  hydrogen-bonded  hydration  shell.  This  peak  is  not 
present  around  Ol  and  04,  showing  that  they  have  a  different  electronic  distribution  that 
gives  them  a  more  hydrophobic  character. 
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The  X-ray  restraints  were  then  introduced.  Two  different  resolutions  were  con¬ 
sidered:  0. 1 5  and  0.2  nm.  At  0. 1 5  nm  resolution,  four  molecular  dynamics  runs  of 
500  steps  of  0.002  ps  each,  were  performed. 

At  0.2  nm  resolution,  two  sets  of  runs  were  done: 

1 .  Four  molecular  dynamic  runs  of  500  steps  of  0.002  ps  each, 

2.  Twelve  molecular  dynamic  runs  of  500  steps  of  0.002  ps  each. 

The  results  are  shown  (see  Figs.  3-5)  as  the  superposition  of  the  original  config¬ 
uration  on  the  simulated  electron  density.  Some  interesting  features  are  observed: 

•  In  the  water  occupancy  map  using  forces  from  the  0. 1 5  nm  resolution  data  and 
averaging  over  4  ps,  many  peaks  appear,  and  all  but  one  “fixed”  water  molecule 
are  positioned  in  a  peak  or  quite  near  to  one.  Water  “channels”  can  be  seen 
(Fig.  3). 


Figure  3.  This  figure  shows  the  erythrol  molecule  marked  as  S 1  ( Solute  1 )  and  its  surface. 
The  oxygen  04  is  marked  by  an  X.  It  also  shows  the  averaged  solvent  occupancy  map 
(solid  line  lattice)  after  4  ps  of  simulation  using  0.15  nm  resolution  for  X-ray  forces, 
contoured  at  0.4  of  the  maximum  value,  and  the  water  molecules  used  for  calculating  the 
pseudo  Fobs.  One  strongly  diffracting  water  molecule  near  04  is  highlighted  (dotted  circles). 
Note  that  the  neighborhood  of  this  site  is  marked  by  the  occupancy  map.  Note  also  the 
existence  of  water  channels. 
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I  \  s/ 


Figure  4.  Same  as  Figure  3,  but  using  0.2  nm  resolution  X-ray  data  to  calculate  the  X- 
ray  forces.  Note  that  the  regions  near  04  and  Ol  are  much  emptier.  The  map  shows 
several  peaks,  which  appear  as  small  regions  when  contouring  at  40%  of  the  maximum 

value. 

•  The  map  corresponding  to  0.2  nm  resolution  data  and  also  averaging  over  4  ps 
shows  some  remarkable  differences.  Around  04  of  erythrol,  we  selected  one  of 
the  “fixed”  waters.  As  is  known  from  simulation,  the  04  is,  for  the  model,  a 
“hydrophobic  oxygen”  (as  the  radial  distribution  function  shows  in  Fig.  2). 
Whereas  in  the  previous  map  there  was  water  near  04,  in  this  case,  the  map  grid 
is  empty  (Fig.  4).  The  model  needs  an  empty  space  and  the  “created”  X-ray 
forces  need  water,  and,  therefore,  a  “push-pull”  process  is  produced,  which  here 
favors  the  model. 

•  A  second  map  using  0.2  nm  resolution  data  was  obtained  after  averaging  over 
12  ps.  The  map  is  flatter  and  the  water  channels  are  less  contrasted.  It  is  seen 
that  in  this  case  the  hydrophobic  zone  near  04  is  occupied  more  (Fig.  5),  meaning 
that  the  longer  simulation  time  allows  for  the  action  of  the  crystallographic  forces. 

Conclusions 

The  algorithm  seems  to  work  properly.  It  tends  toward  an  equilibrium  where 
the  crystallographic  forces  modify  the  occupancy  grid,  which  shows  the  strongly 
diffracting  water  molecules. 
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Figure  5.  Same  as  Figure  4,  but  averaging  over  12  ps  instead  of  4.  Note  that  the  map 
tends  to  be  flatter,  as  expected  from  a  longer  averaging  time,  and  that  the  occupation 
density  now  points  toward  the  highlighted  high  occupation  water  molecule,  meaning  that 
the  longer  simulation  time  is  necessary  for  the  action  of  the  X-ray  forces. 


The  analysis  of  hydration  cannot  be  done  only  through  the  inspection  of  the 
simulated  density,  however  descriptive  it  might  be.  It  should  be  complemented  by 
radial  distribution  functions  and  hydration  lifetime  studies  on  particular  sites. 

The  results  shown  here  are  just  a  test  of  the  algorithm.  Therefore,  the  next  step 
will  be  the  application  to  a  real  system.  At  present,  we  are  using  this  method  to 
analyze  experimental  data  obtained  in  one  of  our  laboratories  [6]  of  an  RNA 
tetradecamer. 
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Abstract 

Artemisinin  and  related  molecules  are  potential  antimalarials  that  contain  the  1,2,4-trioxane  ring 
system.  Several  new  derivatives  have  been  synthesized  and  tested  in  Geneva,  and  this  article  presents 
the  results  of  a  systematic  study  of  the  structure  of  these  molecules,  both  by  the  semiempirical  pm3 
method  and  using  ab  initio  scf  methods.  The  results  highlight  the  feasibility  of  full  optimizations  with 
3-2 1G  and  6-3 1G*  basis  sets  for  these  large  molecules.  Molecular  electrostatic  potential  (mep)  maps  are 
evaluated  and  used  in  an  attempt  to  identify  the  key  features  of  the  molecules  that  are  necessary  for  their 
activity.  There  is  good  agreement  between  the  pm3  and  ab  initio  maps  as  to  the  qualitative  predictions. 
©  1994  John  Wiley  &  Sons,  Inc. 

Introduction 

The  worldwide  emergence  of  resistance  to  the  forms  of  malaria  spread  by  the 
parasite  Plasmodium  falciparum  has  meant  a  large  increase  in  the  incidence  of  the 
disease  and  mortality  from  it,  particularly  in  Africa  and  Asia  [1].  Although  there 
are  several  drugs  that  were  formerly  effective  against  this  form  of  malaria,  in  par¬ 
ticular,  chloroquine,  which  accounts  for  80-90%  of  all  antimalarial  drugs  in  current 
use  [2],  resistant  strains  are  unaffected  by  this  molecule  and  other  chemically  related 
nitrogen  heterocycles.  The  search  for  new  antimalarial  drugs  that  are  effective  in 
this  form  of  malaria  thus  has  a  very  high  priority  in  antimalarial  drug  design. 

Fortunately,  Chinese  researchers  have  found  a  new  lead  compound,  artemisinin 
(formerly  Qinghaosu)  (Fig.  1 ),  (a),  in  extracts  from  herbs  that  have  been  used  in 
China  for  thousands  of  years  [  3  ] .  This  compound  is  a  sequiterpene  containing  the 
1,2,4-trioxane  ring  structure  (b),  and  a  variety  of  derivatives  of  this  molecule,  namely, 
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Figure  1.  (a)  Structure  of  artemisinin;  (b)  1 ,2,4-trioxane;  (c)  arteether;  and  (d)  desoxyar- 

temisinin. 


arteether  (c),  and  several  other  derivatives  (d)  have  been  chosen  for  clinical  tri- 
ais  I4]. 

However,  all  these  compounds  have  disadvantages  in  terms  of  difficulties  in  ad¬ 
ministration  due  to  poor  solubility  or  stability,  and  a  more  rational  approach  to 
the  development  of  better  drugs  of  this  type  was  clearly  needed.  One  such  approach 
that  we  have  adopted  is  to  try  to  identify  the  unique  features  of  these  molecules 
that  are  necessary  for  activity  and  to  suggest  new  molecules  for  synthesis  on  the 
basis  of  these  features. 

For  several  years,  one  of  the  authors  (C.  W.  J.)  has  pursued  this  approach  from 
an  experimental  viewpoint  [  5  ] ,  but  in  this  article,  we  describe  an  alternative  strategy, 
which  is  to  investigate  the  molecules  developed  in  this  experimental  program  by 
theoretical  techniques  that  we  believe  yields  a  more  detailed  rationalization  of  the 
observed  structure-activity  data.  We  previously  described  some  earlier  work  on  ( 1 ) 
[6],  and  the  present  article  extends  the  calculations  to  several  molecules  that  have 
turned  out  to  be  more  effective  antimalarials  [  7  ] ,  but  which  have  not  so  far  been 
studied  in  detail  by  the  methods  of  computational  chemistry.  A  second  article  will 
deal  with  a  class  of  related  molecules  based  on  a  cis  fused  cyclopenteno- 1,2,4- 
trioxane  structure  [  8  ] . 


POTENTIAL  ANTIMALARIAL  COMPOUNDS.  I  119 

Background  to  the  Development  of  the  Relevant  Molecules 
Related  to  Artemisinin 

Several  earlier  studies  have  addressed  the  question  of  how  much  of  (1 )  is  necessary 
for  antimalarial  activity  [3,7].  It  is  known  that  the  endoperoxide  link  in  (1)  is 
necessary,  because  desoxyartemisinin  (4),  where  only  the  ether  bridge  remains,  is 
inactive  [9]. 

A  variety  of  studies  on  cyclic  peroxides  and  tetrahydrobenzopyran-derived  1,2,4- 
trioxanes  have  shown  that  neither  the  peroxide  function  nor  the  1,2,4-trioxane  ring 
alone  are  sufficient  to  confer  antimalarial  activity.  The  rings  in  (1),  labeled  A,  B, 
C,  and  D,  were  removed  in  earlier  work  by  C.  W.  J.  to  give  derivatives  of  varying 
activity.  We  refer  to  these  compounds  with  the  numbers  given  in  the  recent  review 
by  Jefford  and  co-workers  [7]  (Fig.  2).  The  antimalarial  data  is  given  in  Table  I. 

Compound  25,  in  which  ring  D  is  removed,  is  active,  but  compound  19  containing 
the  cyclopentane  ring  is  more  active.  This  shows  that  ring  D  is  not  necessary.  The 
resulting  systems  may  be  called  ABC  structures  and  the  X-ray  structures  of  one  of 
these  molecules  is  known  [10].  In  (25),  the  trioxane  ring  is  in  the  chair  form, 
whereas  it  is  in  a  distorted  boat  form  in  (1 ).  Compounds  based  on  the  ACD  ring 
structure  have  also  been  prepared  [11].  In  particular,  compounds  (22)  and  (23), 
which  differ  only  by  the  CH30  group  being  endo  {22)  or  exo{ 23),  are  both  active, 
but  the  endo  isomer  (22)  is  slightly  more  active. 

The  X-ray  structures  of  these  molecules  [10]  show  the  trioxane  ring  to  be  like 
that  in  (1),  namely,  a  twisted  boat.  The  inactive  compound  (26)  has  also  been 
studied  by  X-ray  diffraction.  Derivatives  in  which  the  CH3 — group  is  replaced  by 
Ph  have  also  been  prepared.  These  are  also  effective,  more  so  than  the  CH3 — 
derivatives  [10]. 

Further  development  of  simpler  molecules  exploited  the  above  observations.  In 
particular,  fusion  of  the  cyclohexane  A  ring  in  a  cis  conformation  with  another 
ring  system  seemed  like  a  potentially  good  structure. 

Although  the  first  series  made,  as  fused  naphtheno- 1 ,2,4-trioxanes,  were  not  active, 
a  new  series  with  m-fused  cyclohexeno  derivatives  were  weakly  active.  However, 


Table  I.  In  vitro  antimalarial  activity  of  some  artemisinin  analogs  against 
P.  falciparum  clones. 


W2  clone  (ng/mL)  D-6  clone  (ng/mL) 


Compound 

IC50 

IC90 

ic50 

IC*, 

1 

1.1 

_ 

2.2 

_ _ 

19 

2.0 

3.3 

2.3 

30.4 

25 

6.2 

25.8 

28.7 

39.8 

22 

1.8 

3.9 

16.5 

11.5 

23 

9.7 

16.9 

75.4 

8.75 

26 


Inactive 


Inactive 
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replacement  of  the  cyclohexene  ring  with  cyclopentene  gave  much  more  active 
derivatives,  some  of  which  approach  (1)  in  activity  [7]. 

Extensive  synthetic  work  on  these  compounds  has  resulted  in  a  great  deal  of 
structure  activity  (S/a)  data  [7].  The  aim  of  the  present  article  was  to  try  to  ratio¬ 
nalize  these  data  in  terms  of  the  molecular  structure  and  properties  of  the  molecules, 
and  in  this  article,  we  focus  on  artemisinin  [compound  (1)]  and  the  related  the 
compounds  (19),  (22),  (23),  (25)  and  (26). 

Computational  Strategy 

Theoretical  Geometries.  All  the  molecules  that  have  antimalarial  activity  are 
relatively  large  and  contain  several  rings  with  conformational  flexibility.  We  pre¬ 
viously  showed  [12]  that  reliable  geometries  of  this  type  of  molecule  can  be  calculated 
by  semiempirical  SCF  methods.  We  showed  that  the  computed  structure  of  (1) 
using  the  pm3  method  was  in  excellent  agreement  [6]  with  the  experimental  X-ray 
structure  [13]  and  this  is  the  preferred  method  for  calculations  in  this  series  of 
compounds  [the  amI  method  gives  values  of  R(0 — O)  that  are  too  long].  We 
found  similar  agreement  for  other  molecules  for  which  we  have  X-ray  structures. 
Hence,  our  initial  studies  of  the  molecules  in  the  present  article  were  carried  out 
at  the  Pm3  level  of  SCF  theory. 

However,  because  we  wish  to  use  the  molecular  electrostatic  potential  (mep) 
maps  as  a  guide  to  the  S/ A  relationships,  we  felt  it  necessary  to  evaluate  the  structures 
and  wave  functions  computed  by  the  ab  initio  SCF  method  using  at  least  a  split- 
valence  (3-2 1G)  basis  set.  For  selected  molecules,  we  also  optimized  the  geometries 
with  a  6-31 G*  basis  set  to  check  the  reliability  of  the  3-2 1G  predictions. 

The  geometry  optimizations  were  carried  out  without  any  constraints,  i.e.,  all 
3 N-6  variables  were  optimized  (Table  II).  This  is  probably  not  necessary  for  every 
variable,  i.e.,  C — H  bond  lengths,  but  the  lack  of  symmetry  in  most  of  these  mol¬ 
ecules  makes  it  at  least  the  consistent  procedure  for  each  one.  It  is  also  found  that 
the  optimizations  proceeded  more  uniformly  when  all  variables  were  optimized. 


Table  II.  Computed  values  of  JFscf  at  the  3-2 1G  and  6-3 1G*  optimized 
geometries  and  no.  variables  optimized. 


Molecule 

Basis  set 

3-2 1G  6-3 1G* 

3N-6 

Artemisinin 

-949.8037 

-955.06882 

120 

Deoxyartemisinin 

-875.4588 

-880.3273 

117 

Jefford  19 

-915.3292 

-920.3837 

132 

Jefford  22 

-761.2065 

-765.407870 

102 

Jefford  23 

-761.2044 

-765.40798 

102 

Jefford  26 

-761.1932 

__ 

102 

Pj26 

-609.1052 

— 

— 

Jefford  25 

-838.8552 

-843.4582 

120 
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Molecular  Properties  and  S/A  Correlations.  Since  the  precise  mechanism  of 
action  and  relevant  receptor  is  not  known  for  these  compounds,  we  attempted  to 
rationalize  the  observed  s/ a  data  in  two  ways:  We  computed  the  mep  on  molecular 
surfaces  round  each  molecule  and  examined  the  features  of  the  mep  as  described 
below  [14-16].  We  also  attempted  to  quantitatively  compare  various  similarity 
indices  [17],  although  this  has  so  far  not  been  as  successful  as  hoped. 

The  Study  of  Model  Compounds .  In  view  of  the  expense  of  calculations  as  large 
as  those  described  above,  it  is  always  sensible  to  investigate  the  structure  of  the 
basic  molecular  skeleton  for  the  active  drug  molecule,  but  omitting  all  CH3 — or 
other  groups,  such  as  OH.  These  prototype  molecules  we  call  pxx  and  these  have 
the  basic  ring  structure  of  the  active  drugs.  Optimization  of  these  pxx  compounds 
is  relatively  fast,  and  the  main  conformational  features  should  be  present  in  these 
compounds.  We  then  add  relevant  CH3 — ,  OH,  and  C6H5 — groups  and  reoptimize 
to  get  the  full  structure. 

Computational  Facilities  and  Software 

This  project  has  been  carried  out  both  at  St.  Andrews  where  the  pm3  calculations 
were  performed  on  a  Tektronix  CAChe  molecular  modeling  system  [18],  and  the 
ab  initio  calculations,  on  a  two  processor  FPS-500  mini-supercomputer  in  St.  An¬ 
drews,  and  in  Geneva,  where  a  Silicon  Graphics  Chrimson  system  was  used  for 
both  scf  calculations  and  graphical  display  of  the  results.  Some  of  the  largest  cal¬ 
culations  were  also  carried  out  at  the  Swiss  Supercomputer  Centre  at  Manno  on 
an  NEC  SX-3  supercomputer. 

The  programs  used  were  mainly  MOPAC-6  [19],  GAUSSIAN  90  [20],  and 
GAUSSIAN-92  [2 1  ] ,  on  our  own  machines,  and  to  a  lesser  extent  GAUSSIAN-92 
on  the  SX3.  In  view  of  the  number  of  basis  functions  needed  (for  most  cases,  at 
least  150),  it  was  more  efficient  to  carry  out  the  SCF  calculations  using  the  direct 
SCF  method,  and  such  calculations  run  very  efficiently  with  GAUSSIAN-92. 

Visualization  of  the  results  was  done  either  on  the  CAChe  system  at  St.  Andrews 
or,  more  recently,  using  the  MOLEKEL  software  developed  in  the  Geneva  laboratory 
by  Flukiger  [22].  mep  were  computed  from  the  Mullikan  charges  and  displayed 
on  a  Connolly  surface.  We  also  examined  the  mep  computer  with  the  Merz-Kollman 
procedure  [23].  The  MOLEKEL  program  is  a  sophisticated  program  for  display 
of  molecular  structures  and  properties  developed  for  Silicon  Graphics  workstations 
and  has  been  extensively  used  in  this  study. 

Results 


Structural  Aspects 

It  is  convenient  to  divide  the  results  into  two  sections  that  follow  the  historical 
development  of  these  compounds  by  one  of  us.  First,  we  compare  artemisinin  ( 1 ) 
with  the  active  compounds  containing  the  ACD  ring  and  also  with  compounds 
(19)  and  (25).  Second,  we  compare  these  active  molecules  with  the  inactive  (23) 
and  (26) .  The  second  article  in  this  series  will  deal  with  the  cyclopenteno  derivatives. 
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Comparison  of  Artemisinin  (1),  (19),  (25),  (22),  (23)  and  (26).  As  pointed  out 
in  our  earlier  article,  the  calculated  structure  of  ( 1 )  is  in  excellent  agreement  with 
experiment  at  the  pm3  level  [6  ] .  The  present  ab  initio  results  and  the  earlier  results 
are  given  in  Table  II.  It  is  particularly  important  to  establish  how  well  different 
levels  of  theory  perform  for  this  molecule,  because  of  the  expense  of  full  optimi¬ 
zations  for  molecules  containing  so  many  atoms.  The  total  energies  for  the  3-2 1G 
basis  set  are  given  in  Table  II  and  also  for  the  6.3 1G*  basis  set  where  these  are 
available.  Table  III  gives  the  calculated  values  of  the  ring  parameters  in  artemisinin 
for  the  different  methods.  The  agreement  between  the  ab  initio  3.2 1G  and  6-3 1G* 
results  is  excellent,  especially  since  the  0 1 — 02  bond  length  is  closer  to  experiment 
than  in  any  of  the  semiempirical  results.  In  artemisinin  itself,  the  trioxane  ring 
adopts  a  twist-boat  conformation,  and  all  the  methods  give  the  torsion  angles  quite 
well.  It  is  interesting  that  both  AMl  and  pm3  give  larger  errors  in  the  torsions  than 
do  the  ab  initio  methods.  The  next  set  of  active  and  inactive  molecules  studies  are 
referred  to  as  (j22),  (j23),  (jl9),  (j25),  and  (j26),  and  their  formulas  are  given  in 
Figures  2-5.  X-ray  data  is  available,  and  Tables  IV- VI  compare  the  results  for 
(j22),  (j23),  and  (j26). 

There  is  again  excellent  agreement  with  the  experimental  data,  and  the  ball-and- 
stick  pictures  are  very  similar  to  those  in  [10] .  Bond  lengths  are  in  error  by  <0.02 
A  in  most  cases,  angles  are  within  2°,  and  the  torsion  angles  in  all  are  of  the  correct 
magnitude  and  sign,  but  individual  angles  may  be  in  error  by  2-3°,  up  to  7°  for 
(j26) .  It  is  important  to  stress  that  with  molecules  as  flexible  as  these  better  agreement 
with  experiment  is  not  expected.  The  basic  structure  of  all  these  molecules  is  re¬ 
produced  at  the  3-2 1G  level. 

The  results  of  the  6-3 1 G  *  calculations  for  ( j22 )  and  ( j23 )  are  also  given  in  Tables 
IV  and  V.  The  results  are  very  similar  to  the  3-2 1G  results,  and  this  fact  is  in 
accordance  with  other  studies  by  the  authors.  In  general,  it  is  not  necessary  to  carry 
out  6-3 1G*  optimizations.  The  structure  obtained  at  the  lower  level  of  theory  is 


Table  III.  Comparison  of  calculated  and  experimental  values  of  the  1,2,4-trioxane  ring  parameters  in 

artemisinin. 


Parameter 

AMl 

pm3 

ZINDO 

Expt 

6-3 1G* 

3-2 1G 

0102 

1.289 

1.544 

1.240 

1.478 

1.390 

1.462 

02C3 

1.427 

1.402 

1.404 

1.403 

1.396 

1.441 

C304 

1.427 

1.428 

1.402 

1.437 

1.408 

1.436 

04C5 

1.416 

1.403 

1.394 

1.390 

1.376 

1.408 

C5C6 

1.537 

1.555 

1.499 

1.529 

1.532 

1.529 

0102C3 

112.5 

110.3 

112.4 

107.5 

109.5 

107.1 

02C304 

103.6 

104.8 

106.7 

107.3 

107.8 

107.3 

C304C5 

115.5 

116.0 

111.8 

114.1 

115.3 

115.7 

04C5C6 

113.5 

115.2 

114.1 

113.3 

112.3 

112.1 

0102C3C4 

-77.7 

-73.3 

-76.9 

— 

-73.4 

-74.6 

02C304C5 

41.9 

52.7 

34.6 

— 

31.1 

32.3 

C304C5C6 

11.5 

2.8 

21.1 

— 

27.4 

28.3 

Figure  3.  mep  from  the  pm3  wave  function  for  jl9 ,  j!5,  j22,  and  j23 . 


Figure  5.  mep  from  the  pm3  wave  function  on  the  electron  density  surface  for  jl9,  j25 , 

j22,  and  j23. 
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Table  IV.  Selected  bond  lengths  (A)  and  angles  (deg)  for  j22. 


Expt 

3-2 1G 

A 

6-3 1G* 

A 

C(l)— 0(1) 

1.417(5) 

1.442 

+0.025 

1.398 

-0.019 

0(1) — 0(2) 

1.473(4) 

1.465 

-0.008 

1.393 

-0.08 

0(2) — C(3) 

1.473(5) 

1.493 

+0.02 

1.440 

-0.033 

C(3) — C(2) 

1.525(6) 

1.529 

+0.004 

1.536 

+0.011 

C(2) — 0(3) 

1.427(4) 

1.434 

+0.006 

1.401 

-0.026 

0(3)— C(l) 

1.424(5) 

1.433 

+0.007 

1.403 

-0.021 

0(4) — C(l) 

— 

3.558 

— 

— 

0(4) — C(2) 

1.391(5) 

1.403 

+0.012 

1.371 

-0.02 

0(3) — C(  1) — 0(1) 

107.7(2) 

106.8 

-0.9 

107.5 

-0.2 

C(l) — 0(1) — 0(2) 

110.1(2) 

108.9 

-1.3 

111.1 

+  1.0 

0(  1 ) — 0(2) — C(3) 

112.5(3) 

112.2 

-0.3 

113.8 

+  1.3 

0(2) — C(3) — C(2) 

106.2(3) 

107.4 

+  1.2 

106.8 

+0.6 

C(3) — C(2) — 0(3) 

112.5(2) 

111.9 

-0.6 

111.7 

-0.8 

C(2) — 0(3) — C(l) 

113.3(3) 

114.3 

+  1.0 

114.4 

+  1.1 

C(2) — 0(4) — C(  1 2) 

113.0(3) 

115.3 

+2.3 

115.3 

+2.3 

0(  3) — C(  1 ) — 0(  1 ) — 0(2) 

-71.5(3) 

-73.2 

-1.7 

-71.5 

0 

C(  1 ) — 0(  1 ) — 0(2) — C(3) 

39.9(3) 

44.9 

+5.0 

42.6 

+2.7 

0(  1 ) — 0(2) — C(3) — C(2) 

23.0(3) 

+  18.2 

-4.8 

19.3 

-3.7 

0(2)— C(3)— C(2)— 0(3) 

-61.5(3) 

-58.8 

-2.7 

-57.5 

-3 

C(3) — C(2) — 0(3) — C(  1 ) 

32.8(4) 

33.0 

+0.2 

31.4 

-1.4 

C(2) — 0(3) — C(  1 ) — 0(  1 ) 

32.1(3) 

31.2 

+0.9 

30.2 

-1.9 

0(3)— C(2)— 0(4)— C(12) 

71.1(3) 

58.0 

64.4 

-0.7 

C(3)— C(2)— 0(4)— C(12) 

-166.7(3) 

179.9 

-172.9 

-4 

0(  1 ) — 0(2) — C(3) — C(8) 

-102.6(3) 

-105.4 

-2.8 

-104.5 

-2 

C(3) — C(8) — C(9) — C(  10) 

-41.9(5) 

-44.1 

-2.2 

-41.1 

-0 

C(8) — C(9) — C(  1 0) — C(  1 ) 

58.6(5) 

62.8 

4.2 

57.5 

-1 

C(9) — C(  1 0) — C(  1 ) — 0(  1 ) 

-95.0(4) 

-97.2 

2.2 

-93.2 

-2 

C(10)— C(l)— 0(1)— 0(2) 

50.8(3) 

47.8 

3.0 

49.9 

0.8 

adequate  for  subsequent  property  calculations.  The  good  agreement  between  the 
ring  torsion  angles  and  experiment  is  particularly  gratifying,  since  it  can  be  concluded 
that  the  ring  structure  of  these  molecules  can  be  reproduced  with  some  reliability. 
Of  course,  the  other  conformers  of  these  rings  will  be  close  in  energy,  but  we  have 
not  so  far  looked  at  these  aspects  of  their  structure. 

It  is  also  of  interest  to  note  that  in  the  case  of  the  inactive  ( j26 ) ,  our  computations 
of  (j26)  and  (pj26),  in  which  the  CH3  and  CH30  groups  are  replaced  by  H,  show 
that  these  molecules  have  significant  differences  in  structure,  the  (j26)  results  being 
closer  to  experiment.  Clearly,  these  substituents  significantly  alter  the  1,2,4-trioxane 
ring  structure. 

Turning  now  to  (jl9)  and  (j25),  which  are  more  closely  related  to  artemisinin 
itself,  the  X-ray  structure  of  (j25)  has  been  determined  previously  [5].  Our  overall 
results  are  again  in  good  agreement,  but  when  the  two  CH3  groups  are  replaced  by 
the  cyclopentane  ring,  the  trioxane  ring  conformation  is  altered  (Table  VII). 


126 


BERNARDINELLI  ET  AL. 


Table  V.  Selected  bond  lengths  (A)  for  j23. 


Expt 

3-2 1G 

A 

6-3 1G* 

A 

C(l)— 0(1) 

1.45(2) 

1.436 

-0.01 

1.451 

0 

0(1) — 0(2) 

1.47(1) 

1.462 

-0.01 

1.396 

-0.07 

0(2) — C(3) 

1.48(1) 

1.482 

0 

1.472 

-0.01 

C(3)-C(2) 

1.50(1) 

1.529 

+0.03 

1.574 

+0.07 

C(2) — 0(3) 

1.43(1) 

1.428 

0 

1.440 

+0.01 

0(3) — C(l) 

1.47(1) 

1.435 

-0.04 

1.441 

-0.03 

0(4) — C(l) 

— 

3.176 

— 

3.548 

— 

0(4) — C(2) 

1.40(1) 

1.406 

+0.01 

1.430 

+0.03 

0(3) — C(l) — 0(1) 

106(1) 

107.7 

+  1.7 

108.0 

+2.0 

C(I) — 0(1) — 0(2) 

109.8(7) 

108.3 

-1.5 

108.8 

-1.0 

0(1) — 0(2) — C(3) 

111.6(7) 

112.0 

+0.4 

110.6 

-1.0 

0(2) — C(3) — C(2) 

109.7(7) 

110.5 

+0.8 

108.8 

-0.9 

C(3) — C(2) — 0(3) 

113.7(8) 

111.4 

-2.3 

112.9 

-0.8 

C(2) — 0(3) — C(l) 

113.7(9) 

115.5 

+  1.8 

110.2 

-3.5 

C(2)— 0(4)— C(12) 

113.9(9) 

115.6 

+  1.7 

110.2 

-3.7 

0(3) — C(  1 ) — 0(  1 ) — 0(2) 

-73.4(8) 

-73.0 

-0.4 

-79.3 

-6.1 

C(  1 ) — 0(  1 ) — 0(2) — C(3) 

43(1) 

46.6 

+3.6 

50.9 

9.0 

0(  1 ) — 0(2) — C(3) — C(2) 

20(1) 

14.6 

-5.4 

12.0 

8.0 

0(2)— C(3)— C(2)— 0(3) 

— 57(1) 

-53.9 

-3 

-56.3 

-0.7 

C(3)— C(2) — 0(3)— C(l) 

27(1) 

29.4 

+2.4 

31.1 

4.1 

C(2) — 0(3) — C(  1 ) — 0(  1 ) 

35(1) 

31.9 

-3.1 

0(3)— C(2)— 0(4)— C(12) 

-61(1) 

-55.0 

-6.0 

-64.2 

-3.0 

C(3)— C(2)— 0(4)— C(  1 2) 

173.0(9) 

-178.1 

-5.0 

-172.7 

-0.3 

0(1)— 0(2)— C(3)— C(8) 

-103.3(9) 

-106.7 

-3.4 

-110.8 

-7.5 

C(3) — C(8) — C(9) — C(  10) 

-40(1) 

-41.1 

-1.1 

-42.2 

-2.2 

C(8) — C(9) — C(  1 0) — C(  1 ) 

57(1) 

61.0 

+3.0 

61.7 

4.7 

C(9) — C(  10) — C(  1 ) — 0(1) 

-95(1) 

-97.4 

-2.4 

-94.9 

+0.1 

C(10)— C(l) — 0(1)— 0(2) 

49(1) 

47.7 

-1.3 

42.8 

6.2 

In  (j25),  the  ring  is  quite  close  to  the  boat  structure,  but  in  (jl9),  the  cyclopentane 
ring  modifies  this  to  a  distorted  twist  boat  structure.  The  differences  in  the  torsion 
angles  are  as  much  as  18°.  Table  VIII  gives  the  1,2,4-trioxane  ring  parameters  in 
the  above  compounds. 

In  summary,  these  results  show  that  ab  initio  SCF  computations  with  a  split- 
valence  basis  set  with  complete  geometry  optimization  can  reproduce  the  experi¬ 
mental  results  for  molecules  of  this  complexity.  It  is  important  to  emphasize  that 
the  starting  geometry  is  not  the  X-ray  structure,  although  it  could  have  been.  We 
just  build  a  chemically  reasonable  structure  with  a  suitable  molecular  modeling 
package,  in  our  case,  the  CAChe  system,  and  then  optimize  the  structure  as  described 
above.  The  computed  structures  were  of,  course,  verified  to  be  minima  using  the 
force  option.  From  these  results,  we  are  confident  that  even  large  molecular  struc¬ 
tures  can  be  reliably  computed  using  the  direct  SCF  procedure  on  supercomputers 
such  as  the  SX3.  (Computer  times  for  the  3-2 1G  optimizations  were  up  to  few 
hours  on  the  SX3  for  the  largest  molecules.) 
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Table  VI.  Selected  geometrical  parameters  for  Pj26  and  j26. 


Expt  0’26)  Pj26  3-2 1G  A  j26  3-21G 


C(l)-0(1) 

0(1) — 0(2) 

0(2) — C(3) 

C(3) — C(2) 

C(2) — 0(3) 

0(3) — C(  1) 

0(4) — C(l) 

0(4) — C(2) 

0(3) — C(l) — 0(1) 

C(l) — 0(1) — 0(2) 

0(1) — 0(2) — C(3) 

0(2) — C(3) — C(2) 

C(3) — C(2) — 0(3) 

C(2) — 0(3) — C(  1 ) 

C(2) — 0(4) — C(  1 2) 

0(3) — C(  1 ) — 0(  1 ) — 0(2) 
C(  1 ) — 0(  1 ) — 0(2) — C(3) 
0(  1 ) — 0(2) — C(3) — C(2) 
0(2) — C(3) — C(2) — 0(3) 
C(3) — C(2) — 0(3) — C(  1 ) 
C(2) — 0(3) — C(  1 ) — 0(  1 ) 
0(3)— C(2)— 0(4)— C(12) 
C(3)— C(2)— 0(4)— C(12) 
0(  1 ) — 0(2) — C(3) — C(8) 
C(3) — C(8) — C(9) — C(  1 0) 
C(8) — C(9) — C(  10) — C(  1 ) 
C(9) — C(  1 0) — C(  1 ) — 0(  1 ) 
C(10) — C(l) — 0(1)— 0(2) 


1.44(1) 

1.451 

1.467(7) 

1.464 

1.45(1) 

1.462 

1.52(1) 

1.515 

1.18(2) 

1.208 

— 

2.307 

1.41(1) 

— 

_ 

(37.2) 

111.5(6) 

108.3 

105.3(6) 

104.1 

106.6(7) 

105.3 

128(1) 

125.8 

— 

(49.6) 

_ 

(-95.1) 

108.6(8) 

110.8 

44.2(8) 

44.1 

-138.1(9) 

-143.8 

— 

(71.3) 

— 

(-40.8) 

-82.9(7) 

-81.9 

-72(1) 

-72.1 

85(1) 

82.3 

-34(1) 

-27.1 

-51(1) 

-58.2 

+0.01 

1.450 

+0.003 

1.463 

+0.01 

1.464 

-0.01 

1.515 

+0.02 

1.208 

— 

4.306 

— 

1.408 

— 

4.744 

-3.5 

(39.5) 

110.8 

-1.2 

104.6 

-1.3 

105.5 

-2.2 

125.9 

+2.2 

(51-3) 

(101.5) 

(-95.9) 

108.7 

-0.1 

42.8 

-4.7 

-142.3 

-1.0 

(71.9) 

(27.5) 

(-126.6) 

(121.0) 

-83.4 

-0.1 

-70.2 

-3.7 

84.6 

-7.0 

-33.6 

-6.8 

-51.5 

Structure /Activity  Correlations 

The  main  aim  of  this  project  was  to  try  to  rationalize  the  observed  S/a  data  on 
these  compounds,  which  at  first  sight  are  not  very  similar.  The  1,2,4-trioxane  ring 
is,  however,  essential  and  our  expectation  was  that  the  features  of  the  mep  in  this 
region  would  be  informative  in  this  respect. 

We  previously  calculated  the  mep  on  the  Connolly  surface  for  artemisinin  and 
desoxyartemisinin  from  semiempirical  SCF  wave  functions,  using  amI  and  pm3 
Hamiltonians  [6].  These  results,  using  mep  computed  from  the  wave  function  by 
the  multipole  expansion  method,  indicated  significant  differences  in  the  mep  maps 
between  the  active  and  inactive  molecules.  The  appearance  of  the  maps  depends, 
of  course,  on  the  direction  of  viewing,  but  it  does  seem  that  the  negative  potential 
region  near  the  trioxane  ring  is  narrower  in  the  inactive  molecule.  This  is  actually 
seen  very  strikingly  in  the  zindo  mep. 

We  believe,  however,  that  more  reliable  MEP  can  be  obtained  by  ab  initio  cal¬ 
culations  of  the  MEP  performed  using  the  structures  computed  as  described  above. 
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Table  VII.  Selected  bond  lengths  and  angles  in  jl9  and  j25. 


j!9 

Expa 

j25 

C(l)— 0(1) 

1.434 

1.412 

1.449 

0(1) — 0(2) 

1.461 

1.477 

1.464 

0(2) — C(3) 

1.480 

1.472 

1.487 

C(3) — C(2) 

1.541 

1.528 

1.532 

C(2) — 0(3) 

1.416 

1.403 

1.418 

0(3) — C(l) 

1.439 

1.447 

1.450 

014— C5 

1.419 

— 

1.419 

0(3) — C(  1) — 0(1) 

108.5 

109.6 

107.4 

C(l) — 0(1) — 0(2) 

105.8 

107.5 

0(1) — 0(2) — C(3) 

109.0 

111.1 

0(2) — C(3) — C(2) 

108.1 

105.6 

108.4 

C(3)— C(2)— 0(3) 

107.0 

111.3 

108.0 

C(2) — 0(3) — C(  1) 

115.1 

116.0 

116.2 

C(2)— 0(4)— C(12) 

120.5 

114.1 

115.3 

0(3) — C(  1 ) — 0(  1 ) — 0(2) 

-48.7 

61.3 

-58.3 

C(  1 ) — 0(  1 ) — 0(2) — C(3) 

76.4 

-71.9 

67.3 

0(1)— 0(2)— C(3)— C(2) 

-29.4 

65.9 

-11.2 

0(2)— C(3)— C(2)— 0(3) 

-35.6 

-55.1 

-48.2 

C(3)— C(2)— 0(3)— C(l) 

65.2 

48.2 

59.0 

C(2) — 0(3) — C(  1 ) — 0(  1 ) 

-20.3 

-51.5 

-4.4 

0(3)— C(2)— 0(4)— C(12) 

154.0 

176.4 

171.0 

C(3) — C(2) — 0(4) — C(  1 2) 

34.3 

49.8 

0(1) — 0(2) — C(3) — C(8) 

— 

— 

C(3) — C(8) — C(9) — C(  10) 

— 

— 

C(8) — C(9) — C(  10) — C(  1 ) 

— 

— 

C(9) — C(  1 0) — C(  1 ) — 0(  1 ) 

— 

— 

C(10) — C(l) — 0(1)— 0(2) 

— 

— 

a  Values  for  j25. 


We  have  the  opportunity  in  this  way  to  improve  the  wave  function  by  using  a  larger 
basis  set,  if  necessary. 

The  mep  can  be  computed  ab  initio  in  G92,  and  we  have  such  studies  underway, 
but  we  wish  first  to  evaluate  the  use  of  mep  computed  from  Mulliken  charges  and 
also  by  the  Merz-Kollman  procedure  in  the  S/a  correlations.  In  the  Merz-Kollman 
procedure,  the  charges  are  derived  from  a  fit  to  the  ab  initio- computed  potential 
on  a  grid  round  the  molecule,  and  this  may  represent  a  more  reliable  method  of 
computing  the  mep  than  using  semiempirical  wave  functions.  Therefore,  for  the 
molecules  described  above,  we  proceeded  as  follows: 

We  first  compute  the  optimized  structure,  and  from  the  wave  function  at  this 
geometry,  a  Mulliken  population  analysis  and  a  Merz-Kollman  analysis  is  carried 
out.  The  charges  are  then  used  to  compute  the  mep  as  a  Connolly  surface  using 
the  MOLEKEL  program. 

The  results  of  the  calculations  are  now  described.  There  is  no  correlation  between 
the  values  of  the  minimum  in  the  MEP,  but  there  are  interesting  variations  in  the 
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Table  VIII.  124  trioxane  ring  parameters  in  different  molecules 


Artemisinin 

J22 

j23 

J19 

j25 

0102 

1.462 

1.465 

1.462 

1.461 

1.460 

02C3 

1.441 

1.442 

1.463 

1.434 

1.450 

C304 

1.436 

1.433 

1.435 

1.430 

1.439 

04C5 

1.408 

1.434 

1.428 

1.416 

1.418 

C5C6 

1.529 

1.529 

1.529 

1.541 

1.540 

0106 

1.477 

1.492 

1.482 

1.480 

1.479 

0102C3 

107.1 

108.9 

108.3 

105.8 

106.1 

02C304 

107.3 

106.8 

107.7 

108.4 

107.9 

C304C5 

115.7 

114.3 

115.5 

115.2 

115.5 

04C5C6 

112.1 

111.9 

111.3 

107.0 

107.4 

C5C601 

111.6 

107.4 

110.5 

108.1 

108.2 

0102C304 

-74.6 

-73.2 

-73.0 

-48.7 

-49.7 

02C304C5 

32.3 

31.2 

31.9 

-20.3 

-18.8 

(0304C5C6 

28.3 

32.9 

29.4 

65.2 

64.5 

04C5C60) 

-50.9 

-58.8 

-53.9 

-35.6 

-36.4 

C5C60102 

10.0 

18.2 

14.6 

-29.4 

-28.2 

C60102C3 

50.3 

45.0 

46.6 

76.4 

76.0 

maps  in  this  series.  It  should  be  noted  that  only  (j26)  and  desoxyartemisinin  are 
totally  inactive,  but  because  of  inherent  uncertainties  in  the  biological  data,  it  is 
reasonable  to  conclude  that  although  artemisinin  is  the  most  active,  (jl9 )  and  (j22) 
are  of  very  similar  activity,  closely  followed  by  (j25). 

Figure  6  shows  the  MEP  of  these  compounds.  The  details  vary  over  the  surface, 
but  all  the  actual  molecules  have  a  region  of  negative  potential  of  similar  shape 
near  the  trioxane  ring,  but  this  region  is  displaced  in  the  inactive  compounds.  This 
negative  region  is  due  to  the  peroxide  linkage  and  is  the  most  noticeable  feature  of 
the  MEP,  which  is  more  similar  in  the  cases  of  (jl9),  (j25),  (j22),  and  (j23)  than 
in  artemisinin  itself,  because  these  four  molecules  have  lost  the  >C=0  group  that 
is  present  in  artemisinin. 


Discussion 

The  results  obtained  in  this  study  shown  both  the  strengths  of  weaknesses  of  the 
use  of  quantum  chemistry  calculations  in  attempting  to  rationalize  s/a  data  as 
large  molecules.  First,  we  are  able  to  reliably  compute  the  structures  of  these  large 
molecules  with  quite  large  basis  sets,  and  the  structures  are  likely  to  be  quite  reliable 
for  those  molecules  where  there  are  no  experimental  data  on  the  molecules. 

However,  the  wave  functions  calculated  with  this  size  basis  set  are  probably  ac¬ 
curate  enough  so  that  the  computed  mep  are  reasonable  approximations  to  the 
“exact  mep”  and  should  reflect  the  potential  felt  by  whatever  the  molecule  interacts 
with  in  vivo.  Unfortunately,  although  the  active  molecules  have  similar  mep  round 
the  essential  trioxane  ring,  it  has  not  proved  possible  to  quantitatively  compare 
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snapshot 


Figure  6.  mep  computed  from  the  ab  initio  6-3 1 G  *  wave  function  using  the  MOLEKEL 
program:  j!9,  j25,  j22,  j23,  artemisinin,  and  desoxyartemisinin. 


these  to  our  satisfaction,  and,  hence,  to  use  this  type  of  approach  to  predict  new 
and  more  effective  molecules  on  the  basis  of  their  MEP.  Partly,  this  is  because 
quantitative  similarity  indices  cannot  be  evaluated  with  enough  precision.  However, 
we  are  confident  that  any  new  active  molecules  should  have  mep  that  are  similar 
to  those  found  for  artemisinin  and  the  new  Jefford  molecules  referred  to  above. 

A  further  difficulty  in  this  study  is  the  lack  of  information  on  the  detailed  mechanism 
of  action  of  these  compounds.  Jefford  et  al.  [10]  postulated  that  the  active  molecule 
interacts  with  haem  to  undergo  electron  transfer  to  the  trioxane  ring,  resulting  in  ring 
opening  to  give  a  radical  anion.  We  earlier  showed  that  addition  of  an  electron  to  1,2,4- 
trioxane  results  in  ring  opening  when  we  carry  out  optimization  of  the  anion  at  the  pm3 
level.  We  confirmed  this  in  ab  initio  calculations  using  the  6-3 1+G  basis  set;  thus,  this 
mechanism  may  be  a  plausible  one.  We  hope  that  we  may  carry  out  further  calculations 
on  the  interactions  of  the  model  antimalarials  with  a  haem  model  in  the  near  future. 
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Nevertheless,  we  believe  that  this  study  has  resulted  in  some  qualitatively  useful  infor¬ 
mation  on  the  mep  of  these  molecules  that  should  be  of  use  in  the  design  of  new  and 
active  analogs  of  these  interesting  molecules. 
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Abstract 

A  theoretically  contracted  agonist  conformation  of  potent  phenoxypropanolamine  derivatives  on  the 
/3radrenoceptor  has  been  analyzed  in  detail.  The  main  effect  of  the  enthalpic  contraction  of  some  6.0- 
7.0  kcal/mol  arises  from  the  movement  of  the  nitrogen  atom  toward  the  aromatic  ring  by  0.7-0.8  A , 
requiring  some  3.0-3. 5  kcal/mol.  A  second  effect  arising  from  the  contraction  can  be  a  dihedral  rotation 
of  some  30°  around  the  O — CH2  bond  of  the  planar  anisole  moiety.  This  rotation  is  correlated  with  an 
effect  arising  from  “in-plane”  deformation  of  the  anisole  moiety  where  opening  of  the  relevant  bond 
angle  releases  steric  constraints  for  this  rotation.  Ort/zo-substituents  assist  this  rotation  indirectly  through 
hyperconjugation  with  the  lone  pair  of  the  OCH2  group,  electron-attracting  substituents  opening  this 
bond  angle  and  lowering  the  energy  required  to  reach  a  given  bond-angle  deformation.  The  adjacent 
ring  weta-substituent  can  be  similarly  affected,  the  strength  of  the  total  effect  being  also  of  the  order  of 
3.0-3.5  kcal/mol.  The  net  effect  gives  rise  to  a  further  contraction  of  the  nitrogen  atom  and  the  beta- 
hydroxyl  group  toward  the  aromatic  ring,  the  beta-hydroxyl  group  showing  a  contraction  of  up  to  0.4- 
0.5  A  along  the  main  axis  of  the  conformer.  The  deformed  conformation  is  consistent  with  the  predicted 
conformer  of  a  fixed-ring  benzdioxepine  molecule  that  possesses  the  highest  degree  of  partial  agonism 
within  the  set  of  phenoxypropanolamine  agents.  It  is  concluded  that  or/^o-substituents  in  phenoxypro¬ 
panolamine  derivatives  can  retain  steric  freedom  in  both  agonist  and  antagonist  action  provided  that 
the  substituent  can  accommodate  the  required  deformation,  both  agonist  and  antagonist  conformer 
forms  lying  within  one  unbound  conformation.  The  agonist  conformer  is  consistent  with  the  proposed 
model  for  the  ligand-activated  transmembrane  proton  transfer  in  the  /3radrenoceptor  where  a  contraction 
along  the  main  axis  of  the  ligand  conformer  (with  some  attendant  distortion  in  the  position  of  the  0- 
hydroxyl  moiety)  is  required  to  activate  proton  transfer.  ©  1994  John  Wiley  &  Sons,  Inc. 

1.  Introduction 

Comparative  thermodynamics  of  the  binding  of  agonists  and  antagonists  in 
ligand-receptor  complexes  and  in-site  mutagenically  modified  receptors  offer  the 
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most  accurate  experimental  evidence  for  gaining  a  detailed  understanding  of  mo¬ 
lecular  mechanisms  in  signal  transduction  at  the  current  time.  Small  perturbations 
to  the  binding  ligand  can  throw  light  on  the  consistency  of  the  binding  mode  of 
the  ligand  molecule  with  respect  to  the  perturbations,  on  its  conformation,  and  on 
the  excitation  of  specific  electrostatic  and  hydrogen-bond  interactions,  whereas  in 
the  region  of  the  perturbing  moieties,  the  effect  of  simple  changes  in  the  phase 
environment  may  be  readily  identified  [1].  Under  defined  conditions,  precise  in¬ 
teratomic  distance  constraints  (<0.5  A)  on  receptor  site  atoms  may  be  obtained  if 
the  conformation  can  be  satisfactorily  identified  among  other  factors  pertaining  to 
the  given  mode  of  binding.  These  interatomic  distance  constraints  are  not  absolute 
quantities  but  are  relative  to  the  mode  of  binding  of  the  ligand  and  of  the  receptor. 

In  the  case  of  the  ft -adrenoceptor,  the  identification  of  the  phenolic  oxygen  atom 
of  Tyr377  as  an  electronegative  atom  capable  of  lying  at  a  known  geometric  distance 
(10.3  ±  0.3)  A  from  an  oxygen  atom  of  Asp138  in  agonist  action  provided  the  basis 
for  the  development  of  an  explicit  transmembrane  proton  transfer  model  through 
a-helices  III,  IV,  V,  VI,  and  VII  [2,3].  The  interatomic  distance  constraint  was 
based  on  a  proposed  agonist  conformer  of  a  phenoxypropanolamine  molecule  where 
a  contraction  along  the  main  axis  of  the  conformer  [4]  altered  the  form  to  approach 
the  conformation  of  a  known  ethanolamine  agonist  isomer  [5]  (with  some  attendant 
distortion  in  the  position  of  the  /3-hydroxyl  moiety).  The  hypothesis  of  proton 
transfer  through  a  Tyr377-Arg156-Tyr157  proton  shuttle  led  to  a  proposed  alignment 
of  the  ft-adrenoceptor  a-helices  on  the  bacteriorhodopsin  model  and  the  wider 
possibility  that  through  a  correlated  gating  mechanism  an  intermittent  ion  channel 
through  helices  I,  II,  III,  and  VII  might  be  developed  within  the  receptor  [2]. 

Such  a  model  should  be  consistent  with  available  thermodynamic  data  on  partially 
stimulating  ligands  bound  to  the  ft -adrenoceptor.  Thermodynamic  differences  be¬ 
tween  agonist  and  antagonist  binding  modes  may  be  explicitly  calculated  within 
the  model,  but  a  precisely  determined  form  of  the  contracted  phenoxypropanol¬ 
amine  agonist  conformer  is  required  for  good  accuracy.  In  this  article,  evidence  is 
examined  for  the  constraints  on  individual  dihedral  angles  of  the  phenoxypropan¬ 
olamine  side  chain,  consistent  with  the  overall  energetics  of  contraction. 

2.  The  ft -Adrenoceptor  Transmembrane  Proton  Transfer  System 

The  CHARMM-based  [6]  model  structures  for  the  resting  and  activated  states 
of  the  ft -adrenoceptor  using  the  agonist  isoprenaline*  were  described  previously. 
The  structures  are  reproduced  in  Figure  1.  In  the  resting  state,  Trp183  restrains  the 
Tyr377  oxygen  atom  at  the  CHARMM  distance  of  5.2  A  from  the  Asn373  amidic 
carbonyl  group.  A  hydrogen-bond  relay,  Asn369-Ser370-Serl45-Asn373,  is  held  at  its 
extremities  by  an  Asn369-NH  hydrogen-bond  proton  donor  interaction  with  the 
carbonyl  group  of  Pro339  and  by  an  Asn373  carbonyl  group  interaction  with  Cys336. 
Asp138  is  the  recognition  site  for  the  ligand  amine  moiety  [7],  the  binding  at  this 
site,  in  turn,  juxtaposing  the  fthydroxyl  group  of  the  ligand  with  Asn369.  Activation 
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of  the  Tyr377  phenolic  oxygen  atom  is  achieved  by  two  hydrogen-bond  proton  donor 
interactions  of  the  ligand,  one  directly  by  the  p^ra-hydroxy  moiety  of  isoprenaline, 
the  other  indirectly  by  the  beta-hydroxyl  group’s  interaction  with  Asn369,  inducing 
rotation  and  hydrogen-bond  reversal  in  the  relay  with  rotation  of  Asn373.  Movement 
of  the  Tyr377  phenolic  oxygen  atom  and  shuttle  system  toward  the  establishment 
of  the  two  hydrogen  bonds  with  the  rotated  Asn373  and  the  para- 
hydroxyl  group  of  the  ligand  liberates  Trp183,  which  moves  to  activate  Tyr157. 

One  interpretation  for  the  proposed  proton  shuttle  of  Tyr377-Arg156-Tyr157  has 
the  labile  Tyr377  phenolic  hydrogen  atom  interacting  through  a  hydrogen-bond 
proton  donor  interaction  with  the  unprotonated  Arg156,  whereas  a  further  proton 
donor  interaction  of  the  latter  is  to  Tyr157.  An  alternative  is  that  Tyr377-Arg156  exists 
as  an  ion  pair  near  the  cytoplasmic  interface  with  Tyr157  protonated,  the  displace¬ 
ment  and  movement  of  Trp183  transferring  the  ion  pairing  to  Arg156-Tyr157.  Recovery 
can  again  utilize  the  position  of  Cys336,  which,  in  the  activated  state,  can  act  as  a 
catalyst  for  the  recovery  of  the  resting  state  through  a  transient  role  as  the  thienate 
ion  [3]. 

The  main  features  of  the  receptor  site  for  the  isoprenaline-activated  state  are 
summarized  in  Figure  2(a).  A  comparative  CHARMM-optimized  position  for  an 
uncontracted  prenalterol  conformer  possessing  an  equivalent  conformation  to  that 
for  the  isoprenaline  is  shown  in  Figure  2(b).  Rotation  of  Asn373  produces  a  very 
weak  NH-proton  donor  interaction  with  the  ligand  itself,  and  activation  of  Tyr377 
is  effectively  weakened  by  one  N — H  •  •  •  O  Tyr377  interaction.  Further  movement 
of  the  proton  shuttle  is  precluded  without  contraction  of  the  phenoxypropanolamine 
molecule. 

3.  The  Bound  Agonist  Conformation  of  0r -Adrenoceptor  Phenoxypropanolamine 
Partial  Agonists — Energetics  of  “In-Plane”  Distortion  of  the  Phenoxy  Moiety 

Synthetic  and  pharmacological  evidence  has  pointed  to  the  side-chain  confor¬ 
mation  of  ethanolamine  and  phenoxypropanolamine  molecules  being  very  similar 
in  their  bound  agonist  forms.  The  agonist  conformation  of  flexible  ethanolamine 
analogs  is  given  by  the  fixed  side-chain  agonist  isomer  [4],  whereas  the  known 
planarity  of  the  anisole  moiety  in  agonist  and  antagonist  action  [8],  the  lack  of  any 
electrostatic  interaction  of  this  group  with  the  receptor,  and  the  nonrelevance  of 
folded  intramolecular  hydrogen-bonded  forms  in  the  agonist  action  of  phenoxy¬ 
propanolamine  derivatives  (phenoxypropanolamine  and  phenylaminopropanol- 
amine  analogs  exhibit  very  comparable  binding  and  partial  agonism  [9] — refer  to 
Table  I)  reduce  the  likely  active  conformers  to  the  equivalent  ethanolamine  form. 

Contraction  along  the  main  axis  of  the  phenoxypropanolamine  molecule  to  ini¬ 
tiate  stimulus  action  in  the  /?r -adrenoceptor  was  based  on  the  enthalpic  differences 
observed  between  comparable  phenethanolamine  and  phenoxypropanolamine  ag¬ 
onist  conformer  interactions  (5. 5-7.0  kcal/mol)  [4].  A  theoretical  1 .0  A  contraction 
in  the  position  of  the  amine  moiety  toward  the  aromatic  ring  (with  some  distortion 
in  the  position  of  the  ^-hydroxyl  moiety)  retaining  the  same  conformation  required 
an  estimated  minimal  basis  STO-3G  energy  of  4.6  kcal/mol.  Differences  in  vibra- 
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tional  contributions  to  the  binding  between  the  relaxed  and  contracted  forms  were 
negligible. 

These  calculations  were  performed  on  the  vinyl  ether  analog  conformer  using 
the  unprotonated  molecule  in  view  of  the  charge  compensation  from  the  ion-pair 
interaction  at  Asp138.  The  main  effect  arises  from  rotation  about  the  bond  governing 
the  position  of  the  nitrogen  atom,  bringing  this  atom  to  an  eclipsed  position  with 
the  hydrogen  atom  H)  [Fig.  3(a)],  The  1.0  A  contraction  is  slightly  too  large,  the 
nitrogen  atom  coming  past  the  eclipsed  form  into  part  of  the  conformer  space  of 
a  nonrelevant  intramolecular  hydrogen-bonded  form,  but  in  the  eclipsed  form,  an 
overall  contraction  of  0.7-0. 8  A  would  require  3. 0-3. 5  kcal/mol.  If  the  ethanolamine 
agonist  conformation  is  strictly  maintained  by  constraining  dihedral  angles,  a  con¬ 
traction  of  only  0.3  A  is  achievable  for  an  STO-3G  expenditure  of  energy  of  5.5 
kcal,  the  three  bond  angles  governing  this  contraction  showing  a  general  narrowing 
by  4.0-4. 5°. 

There  remains  a  2.5-3. 5  kcal/mol  enthalpy  to  be  accounted  for  in  the  energetics 
of  the  conformer  contraction.  Within  the  model,  and  the  known  planarity  of  the 
anisole  moiety,  the  contraction  can  be  associated  with  an  expected  “in-plane”  de¬ 
formation  of  the  aromatic  phenoxy  moiety.  Two  effects  may  be  anticipated  from 
this  distortion:  One  is  model-dependent  on  the  siting  of  the  second  electronegative 
receptor  site  atom,  giving  rise  to  the  approximately  one-dimensional  conformer 
contraction  [Fig.  1(b)].  The  positioning  of  this  atom  was  based  on  the  known  in¬ 
teractions  of  practolol  [10],  prenalterol,  and  pindolol  [4]  and  on  the  van  der  Waals 
contacts  of  meta-  and  /rara-hydrogen  atoms  of  or/ /zosubsti  tuted  partial  agonists. 
(Molecular  structures  are  identified  in  Table  I.  Practolol  is  the  4-NHCOCH3  analog). 
Moderate  “in-plane”  distortion  of  the  meta-  and  /?«ra-hydrogen  atoms  was  antic¬ 
ipated  from  the  contraction.  The  second  effect  arises  from  “in-plane”  deformation 
of  the  planar — OCH2  group  itself,  which  can  affect  the  orientation  of  the  aromatic 
moiety  with  respect  to  the  propanolamine  side  chain.  More  importantly,  for  achiev¬ 
ing  a  required  conformation  with  a  planar — OCH2  moiety,  it  facilitates  moderate 
movement  of  rotation  about  the  C7 — O3  bond  [Fig.  3(b)]  by  opening  the  relevant 
bond  angle  in  the  undeformed  molecule  with  the  consequent  slight  lowering  in 
energy  required  to  reach  the  given  conformation.  Since  both  effects  are  sensitive 
to  the  presence  of  an  oz7/zo-substituent,  the  influence  of  this  group  on  the  aromatic 


Figure  1 .  (a)  Resting  state  of  the  /3r adrenoceptor.  Side  view  showing  the  residues  involved 
in  the  proposed  mechanism  of  proton  transfer.  Trp183  holds  Tyr377  at  an  oxygen-oxygen 
atom  distance  of  5.2  A  from  Asn377.  The  three  residues  near  the  cytoplasm  are  Tyr377, 
Arg156,  and  Tyr157.  (b)  Superimposition  of  phenethanolamine  (dotted)  and  phenoxypro- 
panolamine  conformers  within  the  -adrenoceptor  site.  The  estimated  enthalpy  of  con¬ 
traction  of  the  phenoxypropanolamine  agonist  conformer  to  interact  with  two  electro¬ 
negative  receptor  sites  Oa  and  Ob  is  some  6-7  kcal  [4].  Two  possible  positions  (1  and  2) 
exist  for  the  receptor  oxygen  atom  Oa  where  the  atom  is  predicted  to  be  in  van  der  Waals 
contact  with  the  meta-  and  para-hydrogen  atoms  of  the  para-unsubstituted  phenoxypro¬ 
panolamine  ligand  (the  0-hydroxyl  group  of  the  latter  has  been  omitted  for  clarity).  (c,d) 
Isoprenaline-0 ,  -adrenoceptor  activated  state  showing  reversal  of  the  Asn369-Ser370-Ser145- 
Asn373  hydrogen-bond  relay  and  movement  of  Trp183  to  hydrogen  bond  with  Tyr157. 
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An  antagonist  binding  mods  for  Prenaiterol  (cf.  Isoprenaline) 


(a)  (b) 

Figure  2.  (a)  Positions  of  the  hydrogen-bond  relay  Asn^-Ser^-Ser^-Asn373  and  of 
the  proton  shuttle  Tyr377- Arg1 56-Tyr 1 57  in  the  isoprenaline  conformer  /?radrenoceptor 
activated  state,  (b)  Relative  positions  of  the  hydrogen-bond  relay  and  proton  shuttle  for 
the  isoprenaline  conformer  activated  state  (white)  and  for  an  equivalent  prenaiterol  con- 
former  in  the  extended  form  (yellow)  in  one  possible  antagonist  mode  of  binding. 

ring  deformation  is  examined  in  detail.  The  or/Zzo-substituents  in  sets  of  pheny- 
oxypropanolamine  agents  provide  ready  control  of  the  degree  of  stimulant  activity 
within  fiv -adrenoceptor  complexes,  and  this  influence  has  been  the  subject  of  earlier 
conformational  hypotheses  [1,12]. 

Two-substituted  phenols  were  considered  an  adequate  model  for  assessing  the 
salient  features  of  “in-plane”  aromatic  ring  distortion.  The  primary  effect  of  dis¬ 
tortion  of  aryl  hydrogen  atoms  in  the  x-y  plane  of  the  aromatic  ring  of  phenols 
and  anisoles  is  that  of  <r-bond  distortion.  For  small  distortion,  bond  lengths  will 
change  little,  and  the  7r-bond  overlap  will  be  relatively  unchanged.  The  effects  of  a 
distortion  should  be  reproduced,  therefore,  even  with  limited  basis  calculations. 
Effects  of  overestimation  of  7r  donation  with  a  vacant  p  orbital  effectively  expanding 
the  basis  set  available  for  x-electron  stabilization  [13]  will  have  no  direct  relevance 
to  the  deformation,  but  could  have  secondary  effects  through  changes  in  intrinsic 
bond  lengths.  STO-3G  calculations  on  experimental  mesomeric  effects  are  in  good 
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Table  I.  Stimulus  response  and  binding  of  some  phenyoxypropanolamine  derivatives  on  the  cardiac 

/?! -adrenoceptor: 


Compound 

6A  Gl' 
(kcal) 

i.s.a.b 

(heart  beats/min) 

Stimulus  function 
-RT  log  [e/(l  -  e)]c  (kcal) 

4-OH 

(Prenalterol) 

-0.8 

129  ±  5 

Benzdioxepine 

Not  known 

151 

4-CH3 

+2.7 

Nonstimulating 

2-F 

0 

1 17  ±  2 

-0.02  ±0.04 

2-H 

0 

104  ±  7 

+0.12  ±0.075 

2-OCH3 

0 

101  ±  7 

+0.15  ±0.075 

2-CF3 

0 

90  ±  5 

+0.27  ±0.06 

2-N02 

0 

66  ±  5 

+0.56  ±0.065 

2-CH3 

0 

65  ±  11 

+0.575  ±0.15 

2-C2H5 

0 

29  ±  7 

+  1.19  ±0.17 

2-COCH3 

0 

74  ±  5 

2-COCH3  \ 

Phenylamino- 

~0 

61 

2-CONH2  J 

propanolamine3 

~0 

76 

a  The  relative  free  energies  of  binding  (-RTTogKj.)  are  referenced  to  a  nonaqueous  environment  using 
the  long-chain  ester  PGDP/water  model  [11]. 

b  Intrinsic  sympathomimetic  activities  (i.s.a.)  and  the  resultant  stimulus  functions  are  given  for  the  rat 
heart.  The  maximum  incremental  response  (eA)  is  ^230  beats/min  [8]. 

c  e  =  eBjeA ,  where  eB  and  eA  are  the  maximum  stimulatory  responses  of  the  partial  and  full  agonists, 
respectively. 


agreement  in  monosubstituted  benzenes  [14],  whereas  methoxy  group  nonplanarity 
in  or//2 o-dimethoxybenzenes  is  also  well  represented  in  this  basis  [15]. 

All  molecular  structures,  subject  to  the  given  constraints,  were  fully  optimized 
at  the  HF-SCF  level  using  the  Gaussian  suite  of  programs.  Early  work  with  4-3 1G 
and  STO-3G  basis  sets  employed  Gaussian  80  and  Murtagh-Sargent  optimization. 
Structures  and  harmonic  force  field  data  for  the  2-CH3 , 2-H,  and  2-F  analogs  together 
with  calculations  at  the  6-3 1G**  level  and  data  for  substituted  anisoles  utilized 
analytic  gradient  techniques  with  Gaussian  82  and  88  on  a  Convex  C220  machine 
[16].  Bond  lengths  and  bond  angles  were  determined  to  <  0.0005  and  <0.1,  re¬ 
spectively.  In  all  calculations  on  phenol  derivatives,  the  planarity  of  the  phenol 
skeleton  was  retained  with  the  phenolic  hydrogen  atom  fixed  at  a  bond  angle  of 
115°,  similar  to  the  value  found  in  phenoxymethyl  moieties  [15]. 
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Prcnaltcrol.  30  degree  bond  rotation  around  Cl  -  03  bend 


(b) 
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Substituted  phenolic  structures  were  optimized  subject  to  the  following  con¬ 
straints:  ( 1 )  none,  (2)  3-H  atom  distorted  1 5  °  (the  labeling  of  hydrogen  atoms  given 
from  the  2-substituent),  (3)  the  3-H  and  4-H  atoms  both  distorted  7°,  (4)  the  5-H 
atom  distorted  15°,  (5)  the  4-H  and  5-H  atoms  both  distorted  7°,  and  (6)  the  3-H 
and  5-H  atoms  both  distorted  7°  (Tables  II  and  III).  Table  IV  gives  optimized 
geometries  for  2-substituted  anisoles  and  for  marginal  “in-plane”  deformation  of 
the  l-OCH3  group. 

The  energetics  of  distortion  at  the  free-energy  level  contains  enthalpic  and  entropic 
contributions  from  vibrational  differences  between  the  deformed  and  undeformed 
structures,  together  with  a  small  difference  in  the  zero-point  energies.  For  the  2-H 
and  2-CH3  phenol  conformers,  relevant  differences  in  the  STO-3G  normal  vibrational 
energy  modes  of  the  two  forms  were  found  to  be  small  (A  ~  10  cm-1)  and,  as 
expected,  the  lowest  vibrational  normal  modes  (260  cm-1)  were  too  high  for  a 
significant  entropic  difference  between  the  deformed  and  undeformed  vibrational 
forms.  Vibrational  energy  differences  have  therefore  been  neglected.  An  analysis 
of  the  vibrational  modes  in  the  phenoxypropanolamine  side  chain  [4]  showed  that 
in  spite  of  entropic  contributions  from  much  weaker  vibrational  modes  the  STO- 
3G  estimate  of  the  difference  in  the  vibrational  entropy  between  the  deformed  and 
undeformed  species  was  only  0. 1 5  kcal/mol. 

The  deformational  energies  are  very  comparable  with  all  three  basis  sets.  In  the 
STO-3G  minimal  basis  set  with  the  2-N02  group,  energy  optimizations  gave  very 
shallow  minima.  In  the  unconstrained  molecule,  the  — N02  group  was  orientated 
21°  to  the  plane  of  the  aromatic  ring,  whereas  in  the  constrained  form  (2),  this 
orientation  was  reduced  to  5°  and  effective  planarity  in  agreement  with  the  un¬ 
constrained  2-nitro  anisole  structure  (Table  IV).  Comparable  X-ray  data  suggest 
larger  variation  (52°  [17,18]  and  32°  [19])  in  the  effective  orientation  of  the  nitro 
group  under  the  influence  of  crystal  forces.  General  trends  here  may  not  be  well 
represented. 

Table  II  shows  that  electronic  influences  of  the  or/Zzo-substituent  on  the  defor¬ 
mational  energy  of  3H,  and  3H,  4H  in-plane  aromatic  ring  distortion  to  be  dom¬ 
inant.  For  the  three  conformers  of  2-CH3  phenol,  the  similarity  in  the  STO-3G 
deformational  energy  of  6.3  kcal  for  the  constrained  form  (2)  indicates  that  steric 
contact  between  the  substituent  and  the  phenolic  oxygen  atom  is  not  an  intrinsic 
problem  in  these  forms,  in  agreement  with  van  der  Waals  contact  distances.  Marginal 
steric  contact  in  the  deformed  2-CF3  dominant  conformer  may  have  an  influence 


Figure  3.  Contracted  agonist  conformer  in  phenoxypropanolamine  binding,  (a)  The  pre¬ 
dominant  effect  of  a  3.0-3.5  kcal/mol  enthalpic  contraction  (yellow)  on  the  prenalterol 
conformer  equivalent  to  that  for  isoprenaline.  In  the  contracted  conformer,  the  nitrogen 
atom  is  eclipsed  to  hydrogen  atom  H, .  (b)  Indirect  effect  (pink)  of  a  3.0-3.5  kcal/mol 
enthalpic  “in-plane”  deformation  of  the  planar— OCH2  moiety  on  the  eclipsed  conformer 
(a)  showing  a  30°  rotation  about  the  C7— 03  bond,  (c)  The  result  of  a  26°  dihedral  increment 
around  the  C7 — C8  bond  (green)  to  minimize  differences  from  the  isoprenaline  staggered 
conformer.  (d)  Comparison  of  relevant  intermolecular  distances  in  molecular  overlap  be¬ 
tween  the  expected  agonist  conformers  for  prenalterol  and  isoprenaline. 
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-305.12501  -305.11632  -5.45  -301.72830  -301.71879  -301.72419  +6.0  +2.5 

-403.84870  -403.84220  +4.1  -399.18374  -399.17589  -399.18005  +4.9  +2.3 
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Table  IV.  Geometry-optimized  STO-3G  structures  for  2-substituted  anisoles  with  given  “in-plane” 
deformational  constraints  on  the  l-OCH3  substituent  [dihedral  angle  r  14-8-7-6;  (7)  x  optimized;  (8) 

x=  113°;  (9)  x  =  109°]: 

H 

H 
H 


(au) 

(au) 

A£8_7 

R 

(7) 

x° 

0° 

(8) 

0° 

(kcal/mol) 

H 

-340.31184 

114.6 

114.8 

-340.31166 

113 

114.6 

0.1 

F 

-437.76782 

114.3 

114.4 

-437.767705 

113 

114.3 

0.1 

H  H 

X 

H  N 

-378.89605 

114.2 

115.0 

-378.89593 

113 

114.8 

0.1 

N02 

-541.01434 

117.3 

115.1 

-541.01296 

113 

114.5 

0.9 

(au) 

zLE'9.7 

H 

(9) 

x° 

T° 

(kcal/mol) 

(a) 

-340.30957 

109 

180.0  (optimized) 

1.4 

(b) 

-340.30667 

109 

150.0 

3.2 

(c) 

-340.30465 

109 

135.0 

4.5 

(d) 

-340.30390 

109 

120.0 

5.0 

on  the  energetics  between  the  deformed  and  undeformed  species,  but  the  results 
show  that  this  influence  is  small  and  of  the  order  of  0.5  kcal  despite  the  exaggerated 
bond  lengths  predicted  by  the  minimal  basis.  The  comparative  difference  between 
the  2-F  and  2-CH3  deformational  energies  is  1.4  kcal,  indicating  that  electronic 
influences  dictate  the  deformational  energy  of  the  2-CF3  compound.  The  correlated 
but  weaker  trend  is  repeated  in  the  constrained  form  (3). 

Table  III  shows  the  influence  of  the  2-substituents  on  4-H  and  5-H  in-plane 
bond-angle  distortion  to  be  negligible,  whereas  differences  in  Tables  II  and  III  again 
indicate  a  general  trend  in  the  ease  of  ring  deformation  with  decreased  electron 
ring  density  for  the  3-H  and  3-H,  4-H  deformations.  The  underlying  causes  of  this 
trend  may  be  examined.  The  primary  bond  angle  and  bond  length  changes  in  the 
aromatic  ring  of  the  constrained  form  (2)  are  given  in  Table  V.  Bond-distance 
changes  at  rx  and  r3  for  these  small  perturbations  are  very  small,  being  ~0.003  and 
—0.005,  respectively,  for  all  molecules.  Leading  bond-angle  changes  within  the 
aromatic  ring  are  as  expected  and  are  very  similar  for  both  extended  split- valence 
and  minimal  basis  sets.  The  primary  internal  bond-angle  narrowing  produced  in 


Table  V.  Calculated  bond-angle  and  bond-length  distortions  in  the  aromatic  ring  for  structure  (2)  with  v  =  135°  (u,  undeformed;  d,  deformed): 
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Preferred  deformed  conformer.  Refer  to  Table  II. 
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the  meta  position  (-A b)  has  compensation  with  expanded  adjacent  angles  (a)  and 
(c)  with  a  further  bond-angle  (d)  contraction.  Although  there  is  a  trend  in  increased 
deformation  with  decreased  cr-eiectron  density,  the  intrinsic  effects  of  substituents 
on  bond  angles  are  more  marked. 

Figures  4  and  5  show  the  relative  orientation  of  the  ring  substituents  for  the 
deformed  and  undeformed  species.  A  hyperconjugative  interaction  in  the  c-bond 
framework  is  evident.  Figure  4  shows  the  tilts  of  the  phenolic  oxygen  atoms  given 
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Figure  5 .  4-3 1 G  estimates  for  bond  angles  in  the  constrained  conformer  (2)  of  2-substituted 
phenols,  and  selected  STO-3G  bond  lengths  and  bond  angles  for  the  constrained  (d)  and 
fully  optimized  (u)  conformers. 
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by  angle  a  from  the  simple  sp 2  hybridization  value,  where  reduced  electron  density 
in  the  a-bond  framework  increases  the  angle  a  due  to  the  lone-pair  overlap  of  the 
oxygen  atom  with  the  adjacent  vacant  antibonding  C — C*  orbital  [20].  The  ranking 
of  the  CH3 ,  H,  CF3 ,  and  F  derivatives  are  consistent  with  this  effect  and  the  relative 
orientation  of  the  2-substituent  itself  is  similarly  affected.  The  4-3 1G  data  on  the 
2-H  and  2-F  structures  give  very  similar  results.  The  results  indicate  that  weak 
electrostatic  repulsions  between  the  2-substituent  group  and  the  phenolic  oxygen 
atom  do  not  affect  the  ranking.  The  positions  of  the  2-OCH3  and  2-N02  derivative 
are  more  difficult  to  assess.  Both  2-OCH3  and  4-OCH3  anisoles  have  facile  rotation 
of  the  methyl  groups  out  of  the  plane  of  the  ring,  and  the  nonplanar  conformers 
have  similar  populations  to  the  planar  forms  [15],  but  close  steric  contact  of  the 
ortho-nitro  group  with  the  methoxy  moiety  is  shown  by  marginal  “in-plane”  move¬ 
ment  of  the  anisole  moiety  toward  the  ort/zo-substituent  (Table  IV).  Taking  the 
mean  “in-plane”  tilt  of  the  undeformed  aromatic  ring  (i.e.,  the  difference  in  the 
external  bond  angles  affecting  the  oxygen  atom)  as  a  guide  to  the  overall  effect  gives 
the  tilt  angles  for  F  of  4.1°;  H,  3.5°,  OCH3,  3.3°;  CF3,  3.6°;  N02,  2.5°;  and 
CH3,  2.5°. 

The  effect  of  deformation  affecting  the  4-H  atom  is  weak,  but  consistent  with 
the  expected  trend.  In  the  constrained  conformer  (3)  in  Table  II,  expansion  of  the 
para-hydrogen  atom  bond  angle  w,  with  the  consequent  narrowing  of  the  bond 
angle  a  (- A  a),  would  create  a  compensatory  effect  on  b  (+A6).  Thus,  electronegative 
2-substituents  would  have  a  reduced  effect  due  to  this  compensation.  Within  the 
limits  of  the  basis,  some  evidence  for  this  is  shown  by  the  position  of  the  2-H 
derivative  in  the  relative  energetics  of  the  2-CH3 ,  2-H,  and  2-F  constrained  forms. 
The  2-H  derivative  energy  is  relatively  enhanced.  This  effect  is  confirmed  by  de¬ 
formation  of  the  /tara-hydrogen  atom  alone.  For  w  =  130°,  the  deformational 
energies  are  for  2-F,  2.75;  2-CH3,  2.64;  CF3,  2.53;  and  H,  2.52  kcal.  Although  the 
influence  of  the  2-substituent  is  now  very  marginal,  it  is  again  striking  that  the 
bond  angle  of  the  /rara-hydrogen  atom  in  the  fully  optimized  molecule  correlates 
with  the  deformational  energies.  Deformed  structures  of  conformer  (2)  with  the 
same  bond-angle  constraint  (i>  =  130°)  show  no  relative  change  in  the  pattern  of 
meta-hydrogen  atom  deformation. 

With  these  deformational  patterns,  it  is  seen  that  the  effects  of  “in-plane”  defor¬ 
mational  movements  of  the  1-  and  3-substituents  toward  the  2-<?rt/zosubstituent 
will  correlate  for  small  positional  changes,  but  in  the  case  of  the  2-nitro  group, 
repulsive  proximity  effects  are  apparent  even  in  the  optimized  form.  Table  IV  shows 
the  STO-3G  energetics  for  rotation  about  the  planar  O — C  bond  of  the  l-OCH3 
moiety  in  anisole,  for  some  6°  “in-plane”  deformation  of  the  group  toward  the 
2-H  substituent.  The  “in-plane”  deformation  requires  some  1.4  kcal/mol,  whereas 
a  rotation  of  the  dihedral  angle  up  to  30°  requires  a  further  enthalpy  of  1.8  kcal/ 
mol,  giving  a  total  of  3.2  kcal/mol. 

4.  The  Bound  Agonist  Conformation  of  0i- Adrenoceptor 
Phenyoxypropanolamine  Partial  Agonists — Side-chain  Conformation 

Data  “in  vitro ”  on  the  stimulant  response  of  a  number  of  phenoxypropanolamine 
derivatives  are  given  in  Table  I.  Such  data  can  be  complicated  by  the  signal  am- 
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plification  of  the  responses  [21],  but  evidence  that  this  factor  can  remain  constant 
within  given  sets  of  fiv -adrenoceptor  binding  data  will  be  given  in  a  separate  article 
[22].  The  stimulatory  response  function  in  Table  I  may  be  written 

-imog  ,  e—,  =  —RT log  — x-RT log  (r+  l),  (1) 

1  -  eB/eA  Krs 

where  the  maximum  stimulatory  response  eB  of  the  partial  agonist  B  relative  to 
that  of  a  full  agonist  A  is  related  to  the  ratio  of  the  agonist  conformer  (a)  and 
antagonist  (r)  receptor  binding  constants,  the  f  and  s*  indicating  the  appropriate 
receptor  conformation  in  the  binding,  r  is  the  amplification  factor.  For  high  signal 
amplification,  the  observed  binding  constant  is  dominated  by  the  antagonist  com¬ 
ponent,  and  for  the  ort/zo-substituent  set  under  examination,  the  binding  constants 
are  invariant  when  referenced  to  an  effective  hydrocarbon  environment  [10]  (±0. 12 
kcal/mol).  The  agonist  response  on  the  free-energy  scale  indicates  a  very  sensitive 
effect,  one  that  gives  an  increment  of  10  heart  beats/min  on  the  rat  heart  rate  for 
a  free-energy  change  of  little  more  than  0.1  kcal/mol.  The  comparative  model  for 
the  agonist  activity  of  2-substituted  phenoxypropanolamine  derivatives  is,  however, 
one  of  particular  simplicity.  The  compression  of  the  alphatic  side  chain  is  effectively 
a  constant  of  the  comparison  except  for  the  effects  produced  by  deformation  of  the 
anisole  moiety.  Electrostatic  interactions  of  the  aromatic  ring  hydrogen  atoms  with 
the  adjacent  receptor  hydrogen-bond  proton  acceptor  group  are  not  expected  to  be 
detectable  (even  the  strongest  of  O  •  •  •  •  CH3 — X  interaction,  where  X  is  halogen 
or  equivalent  [23],  here,  would  not  contribute  more  than  20%  of  the  overall  variation 
in  the  stimulus  observed).  The  2-substituted  conformers  at  the  point  of  initiating 
the  stimulus  thus  have  effectively  identical  interactions  to  the  two  electronegative 
sites,  whereas  the  antagonist  contributions  are  also  invariant.  Under  such  conditions, 
the  observed  biological  variation  is  expected  to  correlate  with  the  fraction  of  de¬ 
formed  conformer  but  coupled  to  constant  signal  amplification.  Since  the  2- 
substituents  in  the  bound  agonist  form  exist  in  a  local  nonpolar  hydrocarbon  en¬ 
vironment,  variations  in  the  deformed  conformer  populations  are  almost  certainly 
due  to  intrinsic  properties  of  the  molecular  species. 

Figure  6  shows  the  hyperbolic  response  function  of  the  ort/zo-substituted  phen¬ 
oxypropanolamine  derivatives  plotted  against  the  relative  agonist  to  antagonist 
conformer  populations  for  meta-  and  para-hydrogen  atom  distortion  on  the  ther¬ 
modynamic  scale  at  310°K.  The  relative  conformer  populations  have  been  based 
on  the  intrinsic  STO-3G  energies  of  the  deformed  and  unconstrained  species  using 
the  phenol  model.  There  is  an  obvious  structural  constraint  on  the  2-C2H5  agonist 
conformer  and  its  population  has  been  taken  as  that  of  the  2-CH3  form  adjusted 
for  a  restriction  in  the  equivalent  overlapping  meta  region  of  the  isoprenaline  con¬ 
former,  giving  an  entropic  change  of  0.48  log10  units. 

This  assumption  is  in  agreement  with  an  energetics  of  deformation  of  the  3- 
w^a-hydrogen  atom  toward  the  2-substituent  in  the  region  of  2.3  kcal/mol.  A 
similar  smaller  symmetric  “in-plane”  deformational  effect  of  the  l-OCH3  group  of 
some  1.5  kcal/mol  will  also  correlate  with  this  effect.  Although  it  is  not  possible  to 
distinguish  either  effect  separately,  they  account  for  the  unusually  high  agonist 
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Renzdioxepine  agonist  +  lsoprenaline  conformers 


Figure  7.  Comparison  of  intermolecular  distances  in  molecular  overlap  between  the  pre¬ 
dicted  agonist  conformers  of  the  benzdioxepine  molecule  and  isoprenaline. 

action  shown  by  the  relatively  large  2-CF3  substituent  coupled  with  the  general 
observation  that  small  ortho  groups  are  generally  associated  with  higher  intrinsic 
stimulant  activity  in  this  set  of  compounds.  The  failure  of  the  planar  2-nitro  group 
to  exhibit  maximum  stimulation  in  the  corresponding  analog  argues  for  the  dom¬ 
inant  effect  of  “in-plane”  deformation  of  the  l-OCH3  group  in  the  bound  agonist 


Figure  6.  The  hyperbolic  response  function  for  1-isopropylamino,  3-(2R)  phenoxy  propan- 
2-ol  derivatives  on  the  rat  cardiac  /3r adrenoceptor  plotted  on  the  thermodynamic  scale 
against  the  free-energy  component  for  moderate  distortion  of  the  aryl  meta-  and  para- 
hydrogen  atoms  some  7°  from  their  standard  positions.  The  STO-3G  conformer  population 
of  the  distorted  conformer  i  with  bond  angles  v  -  127°  and  w  =  21°  relative  to  the 
undistorted  form  r  is  modeled  on  2-substituted  phenol  conformers.  e  is  the  intrinsic  stimulant 
activity  of  the  compound.  The  free-energy  component  for  R  =  C2H5  has  been  taken  as 
that  for  R  =  CH3  with  an  additional  unfavorable  entropic  TAS  contribution  of  0.68  kcal/ 

mol  at  310.15°K. 
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Contracted  Prenalterol  agonibt  conformer  :  Hydrogen  bond  relay 


(a)  (b) 

Figure  8.  (a)  Hydrogen-bond  distances  in  the  activated  form  of  the  proton  transfer  system 
for  the  contracted  prenalterol  conformer  in  the  ligand  complex,  (b)  Relative  positions  of 
the  hydrogen-bond  relay  and  proton  shuttle  for  the  activated  receptor  complex  with  the 
contracted  prenalterol  conformer  (yellow)  and  with  the  isoprenaline  agonist  conformer. 


state.  A  guide  to  the  net  effect  of  this  deformation  can  be  given  from  the  sto-3G 
estimates  of  the  mean  “in-plane”  tilt  angle  of  the  undeformed  aromatic  ring. 

5.  The  Bound  Agonist  Conformation  of  0r -Adrenoceptor  Phenoxypropanolamine 
Partial  Agonists — Consistency  with  the  Transmembrane  Proton  Transfer  Model 

Figure  3(b)  shows  the  effect  of  a  rotation  of  30°  around  the  C7 — 03  bond  for  an 
enthalpic  “in-plane”  deformation  of  some  6°  of  the  l-OCH3  group  toward  the  2- 
substituent  in  the  case  of  prenalterol.  The  STO-3G  estimate  of  the  enthalpic  incre¬ 
ment  for  the  contraction  given  in  (a)  and  (b)  is  some  6. 0-7.0  kcal/mol.  To  attain 
the  closest  structure  to  the  staggered  ethanolamine  agonist  conformation,  the  re¬ 
maining  dihedral  angle  around  the  C7 — C8  bond  has  been  incremented  some  25° 
to  minimize  corresponding  nitrogen  and  oxygen  interatomic  distances  [Fig.  3(c)]. 
The  interatomic  distances  in  the  overlap  of  prenalterol  and  isoprenaline  are  shown 
in  Figure  3(d).  The  main  obvious  difficulty  is  to  reduce  the  beta-  and  para- hydroxy 
interatomic  distance  within  the  constraints  of  the  phenoxypropanolamine  con¬ 
former.  Hydrogen-bond  projections  to  receptor  atoms  permit  a  tolerance  in  this 
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distance,  but  further  rotation  around  the  C7 — 03  bond  much  further  than  30°  is 
precluded  with  a  planar — OCH2  moiety.  Forcing  the  anisole  moiety  out  of  plane 
within  the  agonist  conformation  by  fixing  the  — CH2  moiety  at  60°  to  the  plane  of 
the  aromatic  ring  might,  therefore,  raise  the  degree  of  agonist  binding  at  the  expense 
of  increased  deformation  of  the  receptor  for  both  agonist  and  antagonist  binding. 
The  predicted  agonist  conformer  of  a  benzdioxepine  molecule  (Fig.  7)  shows  the 
intermolecular  tolerances  in  comparative  overlap  with  the  isoprenaline  conformer 
to  reduce  to  ~  0.5  A.  The  benzdioxepine  molecule  has  the  highest  intrinsic  stim¬ 
ulatory  activity  of  any  phenoxypropanolamine  agent  (refer  to  Table  I)  and  is  in 
accord  with  the  proposed  agonist  form  for  prenalterol. 

The  deformational  analysis  shows  that  within  the  model  of  agonist  binding 
of  potent  phenoxypropanolamine  conformers  the  steric  freedom  of  the  ortho- 
substituent  is  largely  maintained  but  slight  deformations  in  the  neighboring  1 -oxygen 
and  3-hydrogen  atom  positions  occur  on  tighter  binding.  Deformation  in  the  position 
of  the  aromatic  ring  5-hydrogen  atom  in  position  a2  [Fig.  1(b)]  is  not  detectable 
and  has  a  constant  effect,  if  present,  for  the  set  of  derivatives  examined.  The  findings 
are  consistent  with  the  proposed  contracted  agonist  conformer  within  the  activated 
f3 1  -adrenoceptor  complex.  The  hydrogen-bond  distances  in  the  activated  form  of 
the  proton  transfer  system  using  the  contracted  phenoxypropanolamine  conformer 
are  shown  in  Figure  8(a).  The  relative  positions  of  the  residues  compared  with  those 
for  the  isoprenaline  conformer  are  given  in  Figure  8(b). 

6.  Conclusions 

A  small  but  important  effect  on  the  control  of  stimulant  activity  by  ortho- 
substituents  in  f3r adrenoceptor  agents  of  the  phenoxypropanolamine  type  can  be 
attributed  to  the  enthalpy  associated  with  deformational  effects  on  contraction  of 
a  given  binding  conformer.  The  deformations  are  dependent  on  the  constraints  of 
the  receptor  site  that  require  a  planar  phenoxy  moiety.  Minimal  basis-set  calculations 
were  found  consistent  with  more  extensive  calculations  to  give  insight  into  the 
underlying  causes  of  “in-plane”  <r-bond  deformations.  The  defined  contracted  ag¬ 
onist  conformer  is  of  sufficient  accuracy  for  calculating  observed  entropic  differences 
between  binding  of  agonist  and  antagonist  conformers  in  an  explicit  model  of  trans¬ 
membrane  proton  transfer  within  -adrenoceptor  a-helices. 
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Abstract 

We  have  performed  a  conformational  analysis  of  double-stranded  (dA:pT)5  and  triple-stranded 
(dA:pT-pT)5  helices  for  all  possible  variants  of  mutual  orientation  of  oligoamide  and  oligonucleotide 
strands  by  means  of  AMBER  3.0.  Computation  results  showed  that  the  conformational  flexibility  of 
chimeric  helices  is  practically  like  the  DNA  flexibility,  although  orientation  of  atoms  around  the  amide 
bond  is  almost  planar,  cis-  and  tra/w-orientations  are  close  in  energy.  Permissible  changes  in  helical 
parameters  of  chimeric  helices  practically  coincide  with  the  corresponding  parameters  of  double-  and 
triple-stranded  DNA  helices.  Double-stranded  chimeric  helices  exhibit  a  tendency  to  twist  accompanied 
by  helical  pitch  decreasing.  Three-stranded  chimeric  complexes,  on  the  contrary,  exhibit  a  tendency  to 
unwinding.  Energy  gain  of  chimeric  helices  is  noticeable.  Thus,  double-stranded  chimeras  are  characterized 
by  the  energy  of  20  kcal/mol  per  monomer  unit  lower  than  double-stranded  DNAs.  The  energy  gain  of 
triple-stranded  chimeric  complexes  is  about  40  kcal/mol  per  monomer  unit.  There  is  qualitative  correlation 
between  the  experimentally  obtained  enthalpy  of  chimeric  complexes  and  their  calculated  potential 
energy.  It  fully  explained  the  ability  of  oligoamides  to  interact  with  DNA  following  oligoamide  strand 
invasion  of  the  duplex  through  D-loop  formation.  The  dependence  of  energy  on  mutual  strand  orientation 
in  chimeric  duplexes  is  weak.  Energy  penalty  of  duplexes  with  parallel  orientation  of  5'  -*■  3'  and  N  -► 
C  chain  vectors  is  about  0,7  kcal /mol  per  monomer  unit.  The  dependence  of  energy  on  mutual  strand 
orientation  in  chimeric  triplexes  is  much  more  appreciable.  The  most  advantageous  is  parallel  orientation 
of  5'  -►  3'  and  N  -►  C  vectors  of  Watson-Crick  chains  accompanied  by  antiparallel  orientation  of  the 
Hoogsteen  oligoamide  chain.  It  was  shown  that  the  stability  of  double-stranded  oligonucleotides  may  be 
increased  as  a  result  of  oligoamide  insert  of  three  or  four  monomer  units  in  one  of  the  oligonucleotide 
chains.  The  length  and  base  sequence  in  the  insert  allowed  one  to  modulate  the  degree  of  duplex  stabi¬ 
lization.  It  is  important  that  such  stabilization  may  be  obtained  without  any  distortion  in  vector  character 
of  nucleotide  duplex  formation.  It  is  evident  that  this  method  of  stabilization  of  helices  is  suitable  also 
for  triplexes.  Moreover,  in  this  way,  one  can  overcome  the  difficulties  connected  with  the  low  penetration 
ability  of  PNA  in  living  cells.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Three  years  ago,  Nielsen  and  his  colleagues  designed  a  polyamide  that  could 
recognize  both  double-  and  single-stranded  DNAs  through  Watson-Crick  and 
Hoogsteen  base  pairing  [1,2] .  The  structural  element  of  this  polyamide  was  the  2- 
aminoethylglycine  unit  and  thymine  attached  through  methylenecarbonyl  group. 
Their  complexes  with  oligodeoxyadenylic  acid  were  studied  in  detail  [1-5].  They 
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are,  as  a  rule,  three-stranded,  consisting  of  one  oligodeoxynucleotide  and  two 
oligoamide  chains.  The  possibility  of  duplex  formation  consisting  of  thymine- 
constituting  oligoamide  and  oligodeoxyadenylic  acid  was  also  registered  [1],  but 
rarely.  If  the  oligoamide  chain  carries  not  only  thymines,  but  also  cytosines,  the 
stoichiometry  of  complexes  2:1  is  preserved  [6].  But  if  all  four  nucleic  bases  are 
present  in  the  oligoamide  chain,  then  only  duplexes  were  registered  [7].  Melting 
temperatures  of  chimeric  complexes  are  considerably  higher  than  those  for  double- 
or  triple-stranded  helical  DNAs  of  the  same  sequence.  Binding  of  oligoamide  chains 
to  DNA  may  influence  the  biological  functioning.  Thus,  the  complete  inhibition 
of  restriction  enzyme  cleavage  was  obtained  as  a  result  of  PNA  binding  [4].  There 
is  also  an  indication  of  violations  of  transcription  and  translation  as  well  as  a  gene- 
specific  antisense  effect  in  mammalian  cells  [8,9]  in  consequence  of  PNA  formation. 

But  it  is  useful  to  investigate  another  possible  utilization  of  the  discovered  oli- 
goamides.  We  propose  to  use  a  short  PNA  insert  in  a  single  oligonucleotide  chain 
as  it  may  influence  the  stability  of  their  duplexes  with  intact  complementary  oli¬ 
gonucleotides.  The  character  of  the  insert  (its  length  and  base  sequence)  are  the 
parameters  that  may  allow  us  to  modulate  the  stability  of  double-stranded 
oligonucleotides. 

Conformational  possibilities  of  chimeric  helical  complexes  are  not  sufficiently 
known.  There  are  two  amide  groups  in  any  monomer  unit  of  the  oligoamide  chain, 
and  some  peculiarities  of  conformational  behavior  of  chimeric  duplexes  and  triplexes 
may  be  expected.  The  vector  character  of  oligoamide  and  oligonucleotide  chains 
noticeably  enlarges  the  number  of  principle  different  conformers.  Hence,  confor¬ 
mational  possibilities  of  PNAs  demand  spatial  investigation. 

In  the  light  of  all  the  above-stated,  it  is  very  important  to  have  available  infor¬ 
mation  about  conformations  of  PNA.  AMBER  3.0  [10]  was  used  for  the  com¬ 
plete  conformational  analysis  of  double-stranded  (dA:pT)5  and  triple-stranded 
(dA:pT«pT)5  helices.  We  also  analyzed  the  influence  of  short  PNA  inserts  in  one 
strand  of  double-stranded  decanucleotides  (dA:dT)j0  (both  in  the  dA  chain,  and 
in  the  dT  chain)  on  helical  parameters  and  their  alteration  on  the  boundary  of  the 
insert  and  estimated  the  possibility  of  duplex  stabilization  with  the  help  of  such 
inserts. 


Models  of  Helical  Chimeric  Complexes:  Starting  Structures 
for  Energy  Minimization 

There  is  convincing  evidence  that  proved  the  existence  of  Watson-Crick  and 
Hoogsteen  binding  between  thymines  of  oligoamides  and  adenines  in  oligonucle¬ 
otide  chains  in  PNA  duplexes  and  triplexes  (Fig.  1 ).  If  we  take  into  consideration 
the  isomorphism  of  amide  and  nucleotide  monomer  units,  we  may  use  as  starting 
coordinates  of  double-  and  triple-stranded  DNAs  those  obtained  from  fiber  dif¬ 
fraction  data  of  Amott  and  his  colleagues  [1 1,12].  It  should  be  emphasized  that 
each  of  Amott’s  structures  allowed  us  to  obtain  four  variants  of  starting  structures 
of  oligoamides  (Fig.  2).  Structures  of  the  I  and  III  types  differ  from  the  structures 
of  II  and  IV  types  in  the  chain  direction. 


PNA  COMPLEXES  OF  POLYNUCLEOTIDES  AND  POLYAMIDES 


159 


Triplet  ATT 


Figure  1 .  Hydrogen  bonding  in  the  A:T  base  pair  and  in  the  A:T  •  T  base  triplet,  “a”  and 
“b”  denote  the  nucleobase  side  in  accordance  with  [14]. 


(a )  Models  of  Double-Stranded  Chimeric  Structures 

There  are  two  principal  different  chimeric  duplexes:  with  parallel  and  antiparallel 
orientation  of  5'  -►  3'  and  N  -►  C  vectors  of  Watson-Crick  chains  [Fig.  3(A)]. 
Each  of  them  could  be  realized  by  two  different  schemes:  I-st  and  Ill-d  for  antiparallel 
and  Il-nd  and  IV-th  for  parallel  orientation  of  the  chains.  If  we  use  X-ray  coordinates 
of  both  the  B-  and  A-forms  of  DNA,  we  dispose  of  four  starting  points  for  antiparallel 
and  four  starting  points  for  parallel  structures  for  energy  optimization  of  chimeric 
duplexes. 


(b)  Models  of  Double-Stranded  Oligonucleotides  with  Oligoamide  Insert 

Oligoamide  inserts  of  2,  3,  and  4  monomer  units  were  executed,  in  turn,  in  every 
chain  of  double-stranded  decanucleotides  (dA:dT)I0.  Calculations  were  done  for 
structures  with  inserts  both  in  dA-  and  dT-chains.  Figure  4  represents  the  molecular 
structure  on  the  boundaries  of  the  inserts.  All  variants  of  the  mutual  orientations 
of  the  chains  in  the  insert  and  in  the  decanucleotide  were  investigated  [Fig.  3(B)]. 
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B. 


5'  N  N 


I  II 

Two  variants  of  parallel  PNA  triplexes 


Two  variants  of  antiparallel  PNA  triplexes 


c. 


Figure  3.  Different  variants  of  chain  orientations  in  chimeric  complexes:  (A)  two  variants 
of  chain  orientations  in  PNA  duplexes;  (B)  two  variants  of  orientations  of  PNA  insert  in 
DNA  duplexes;  (C)  four  variants  of  chain  orientations  in  PNA  triplexes. 


(c)  Models  of  Triple-Stranded  Structures 

In  triple-stranded  structures,  every  type  of  mutual  orientation  of  Watson-Crick 
chains  allowed,  in  addition,  two  variants  of  Hoogsteen  chain  orientation  [Fig.  3(C)]. 
Hence,  there  are  four  essentially  different  chimeric  triplexes.  Each  of  these  variants 
may  be  realized  with  the  help  of  the  four  different  starting  schemes  presented  in 
Figure  2.  Therefore,  we  dispose  of  16  different  starting  points  for  energy  optimization 
of  chimeric  triplexes. 
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Figure  4.  Peculiarities  of  molecular  structure  on  the  boundaries  of  the  PNA  insert  in 

DNA  duplexes. 


Computational  Method 

(a)  Scanning  of  Helical  Parameters  Space 

We  have  written  some  auxiliary  programs  that  allowed  us  to  obtain  atomic  co¬ 
ordinates  of  double-  and  triple-stranded  regular  helices  from  coordinates  of  the  first 
monomer  unit.  We  obtained  a  conformational  map  for  every  type  of  different 
structures  described  above  by  the  two-step  procedure.  As  the  first  step,  we  searched 
for  the  local  minima  of  the  structure,  responding  to  the  fixed  helical  parameters 
(d,  t)  scanned  in  the  wide,  but  reasonable  values,  for  the  mutual  orientation  of 
bases  in  base  pairs  (or  triplets).  It  is  natural  that  in  the  result  of  the  arbitrary  helical 
transformation  of  monomer  coordinates  the  bond  lengths  and  bond  angles  in  the 
sections  of  oligomers,  connecting  monomer  units,  are  far  from  equilibrium.  But 
several  cycles  of  energy  optimization  are  enough  to  bring  them  near  to  equilibrium. 
A  further  optimization  process  brings  structure  to  the  local  minima  of  potential 
energy  with  barely  fixed  helical  and  base  parameters.  The  next  step  was  energy 
optimization  without  limitation  on  the  mutual  base  orientation,  which  brings  the 
structure  to  the  local  minima  that  is  maximally  close  to  fixed  in  the  first  step. 
Conduction  of  this  procedure  in  all  reasonable  regions  of  the  helical  parameters 
(d,  t)  allowed  us  to  judge  the  potential  surface  of  every  type  of  chimeric  structure. 
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(b)  Scanning  of  Dihedral  A  ngles  *  Space 

It  was  shown  earlier  by  conformational  computations  of  canonical  antiparallel 
[13]  and  parallel  structures  of  polynucleotides  [14]  that  there  are  two  main  stable 
and  essentially  different  regions  of  conformational  angles  in  DNA  duplexes.  The 
first  is  characterized  by  a  ( gauche ,  gauche-)- region  of  orientation  around  C4' — C5' 
and  05 7 — P  bonds.  It  is  the  region  of  the  lowest  value  of  conformational  energy 
and  is  typical  for  experimentally  defined  duplex  structures  of  different  forms  of 
DNA,  RNA,  and  DNA  triplexes.  The  second  is  characterized  by  a  (trans,  trans)  - 
region  of  orientation  around  these  bonds.  We  called  it  the  MIN2  region.  It  is  un¬ 
common  and  experimentally  realized  in  only  a  few  cases.  One  of  examples  is  con¬ 
formation  of  dT  chain  in  hybrid  duplexes  poly(A):poly(dT)  [15].  We  propose 
that  this  conformation  may  play  a  significant  role  in  the  structures  when  hydrogen 
bonds  in  double-stranded  complexes  differ  from  the  canonical  Watson-Crick  type, 
namely,  in  parallel  stranded  duplexes. 

The  amide  angle  (co)  is  typical  for  the  polyamide  chain.  Orientation  around  the 
C  -►  N  bond  is  nearly  planar,  owing  to  its  partial  double  character.  All  cis-  and 
Jraws-orientations  are  possible.  In  polypeptides,  trans- orientation  around  this  bond 
is  favored,  due  to  steric  repulsion  of  C“  atoms  of  the  neighboring  residues  (Pro  is 
an  exception  to  the  general  rule).  We  had  to  elucidate  to  what  extent  the  frozen 
rotation  around  this  bond  influences  the  conformational  mobility  of  the  whole 
chimeric  duplexes  and  triplexes.  For  this  purpose,  we  had  to  obtain  conformational 
maps  by  the  way  described  above,  using  four  essentially  different  conformations  of 
monomer  unit  (y,  a,  w):  ( gauche ,  gauche-,  trans),  (gauche,  gauche-,  cis),  (trans, 
trans,  trans),  and  (trans,  trans,  cis). 

(c)  Methods  of  Optimization 

All  structures  were  minimized  in  two  stages:  At  first,  we  used  the  Steepest  Descent 
algorithm  with  a  force  criterion  0.5  kcal  per  step  (or  1000  max.  cycles).  It  was 
followed  by  the  algorithm  of  Polak-Ribierre  (Conjugate  Gradients)  with  a  force 
criterion  of  0.01  kcal  per  step. 

(d)  Electrostatic  Energy  Term 

There  are  no  charges  for  the  oligoamide  unit  in  the  AMBER  3.0  program.  We 
had  to  add  them  into  the  program  and  coordinate  their  magnitude  with  the  whole 
parameter  system.  For  this  purpose,  we  calculated  atomic  charges  of  the  oligoamide 
unit  as  well  as  charges  for  the  nucleotide  unit  by  two  different  semiempirical  meth¬ 
ods — cndo  and  am  1  and  compared  them  with  the  charges  accepted  for  peptides 
and  nucleotides  in  AMBER  3.0.  The  differences  in  atomic  charges  obtained  by  the 
AMl  method  and  charges  accepted  in  the  AMBER  3.0  program  are  strongly  pro¬ 
nounced,  but  they  are  less  pronounced  in  the  case  of  cndo  charges.  In  the  AMBER 
3.0  program,  the  value  of  the  H-bonding  energy  as  well  as  the  torsion  energy  depends 
on  the  magnitude  of  the  atomic  charges.  Therefore,  we  had  to  choose  the  values 
that  are  closer  to  the  inner  AMBER  charges — charges  obtained  by  the  cndo  method. 
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B. 

Figure  5.  Atomic  charges  in  monomer  units:  (A)  oligoamide  unit;  (B)  oligonucleotide 

unit. 


The  resulting  set  of  atomic  charges  of  monomer  units  is  presented  in  Figure  5. 
Calculations  were  held  down  by  the  distant  dependent  dielectric  permittivity  e  = 
Rjj  ( scale  factor  =  1 ). 


Results  and  Discussion 

(a)  Double-Stranded  Complexes  ofPNA 

Conformational  and  helical  parameters  and  corresponding  potential  energy  per 
monomer  unit  in  double-stranded  complexes  of  PNA  are  presented  in  Table  I  for 
each  of  four  conformational  regions  in  both  variants  of  mutual  chain  orientations. 
For  a  more  convenient  comparison,  we  provided  the  optimization  of  double- 


Table  I.  Conformation  and  intramolecular  potential  energy  of  chimeric  two-stranded  complexes  (dA)5:(pT)5. 
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stranded  pentadeoxynucleotides  (dA:dT)5  and  (dAT:dTA)5  in  the  same  parame- 
trization  and  optimization  regime. 

It  is  obvious  from  the  data  in  Table  I  that  double-stranded  helices  twisted  and 
the  helical  pitch  became  smaller  if  one  changes  the  thymine-carrying  oligonucleotide 
chain  to  the  oligoamide  chain.  It  resembles  DNA  twisting  in  high  salt  solutions 
and  formation  of  the  C-form  of  DNA  [  16  ] .  It  is  obvious  that  at  the  physical  bottom 
of  such  twisting  is  the  lack  of  high  negative  charges  in  the  polyamide  chain.  The 
most  typical  limits  of  PNA  twisting  are  40°-46°.  But  more  twisted  (r  =  49°,  for 
[cis,  trans,  trans]- type  of  conformer)  so  considerable  more  untwined  (r  =  32°,  for 
[trans,  tram ,  trans]-  type  of  conformer)  PNA  conformations  are  possible.  It  is  im¬ 
portant  to  note  that  these  considerable  changes  in  conformation  characterize  PNA 
duplexes  with  antiparallel  orientation  of  chains  vectors.  Just  antiparallel  is  the  lowest 
potential  energy  duplex,  namely,  the  {trans,  gauche,  gauche)- type  of  conformer. 
Hence,  conformational  computations  indicate  that  the  most  probable  chimeric 
duplex  is  characterized  by  antiparallel  orientation  of  the  5 '  -►  3 '  and  N  ->  C  vectors 
of  Watson-Crick  chains.  It  has  many  advantages  both  in  potential  energy  and  in 
conformational  entropy.  The  best  conformer  of  this  kind  is  presented  in  Figure  6. 
It  is  interesting  that  in  the  experimental  work  of  Nielsen  et  al.  [7]  it  was  shown 
that  if  the  stochiometry  1:1  is  strictly  fulfilled  (in  the  cases  when  all  four  nucleic 
bases  presented  in  PNA  structure)  exactly  antiparallel  orientation  of  the  strands  is 
observed. 

The  limits  of  conformational  angles  changing  in  PNA  duplexes  of  the  (gauche, 
gauche)- type  are  identical  to  their  limits  in  oligonucleotides  with  the  exception  of 
glycoside  angle  x.  The  last  accept  only  values  typical  for  the  B-  but  not  for  the  A- 
form  of  DNA. 

The  atomic  orientation  around  the  amide  bond  is  nearly  planar.  Its  deviation 
from  planarity  does  not  exceed  8°.  But  there  is  no  noticeable  definition  of  confor¬ 
mational  flexibility  of  chimeric  helices. 


Figure  6.  Conformer  with  antiparallel  orientations  of  W-C  strands  in  chimeric  duplexes. 

Stereoview. 
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Intramolecular  potential  energy  of  hybrid  complexes  is  lower  than  those  of  oli¬ 
gonucleotide  duplexes.  The  energy  gain  is  about  20  kcal/mol  per  nucleotide  pair. 

(b)  Double-Stranded  Oliginucleotide  Complexes  with  PNA  Insert  in  One 
of  the  Chains 

The  most  important  thing  to  know  before  planning  experimental  work  is  to  what 
extent  the  short  PNA  insert  may  disturb  the  double-stranded  helix.  The  energy  gain 
due  to  the  type  of  insert  is  also  informative.  Therefore,  in  Table  II,  we  present  the 
local  helical  parameters  at  the  center  of  the  inserts  and  at  its  boundaries  for  every 
chain  direction  of  the  inserts  in  (dA)10  and  in  (dT)10  chains  of  the  duplex.  Only 
conformations  with  the  lowest  intramolecular  energy  are  presented.  From  the  data 
obtained,  it  is  evident  that  antiparallel  orientation  of  the  oligoamide  chain  in  the 
insert  gives  the  most  stable  decamers.  But  if  the  length  of  the  insert  is  less  than  3 
monomer  units,  it  causes  an  energy  penalty  in  comparison  with  the  homooligo¬ 
nucleotide.  Thus,  the  intramolecular  potential  energy  of  (dA:dT)10  is  about  -625 
kcal/mol,  and  the  most  stable  form  of  decamer  with  an  antiparallel  insert  in  the 
(dA)io  chain  is  only  about  -607  kcal/mol.  But  if  the  length  of  the  insert  in  the 
(dA)10  chain  exceeds  3  monomer  units  [  for  inserts  in  the  (dT)10  chain,  4  monomer 
units),  it  caused  a  noticeable  energy  gain.  It  reaches  the  values  of  about  —30  kcal/ 


Table  II.  Perturbation  of  double-stranded  DNA  with  the  short  inserts  of  PNA. 


Local  parameters  of  nonhomogeneous  helices 

Energy3 

(kcal/mol) 

i  ypc  ui  msci  i 

in  decamers 

Dy(A) 

ry(°) 

dp„a(A) 

7-pna  (°) 

dy  (A) 

ry(°) 

(pT)2 

5'  —  3' 

C  <-  N 

3.58 

33.5 

3.24 

30.0 

2.95 

31.3 

-595.3 

5'  —  3' 

N  -►  C 

3.44 

35.2 

3.12 

31.6 

3.27 

32.6 

-593.0 

<PT>3 

2 

t  U 

3.35 

36.7 

3.43 

31.9 

2.88 

32.8 

-619.9 

(pT)4 

5'-*  3' 

C  N 

3.33 

37.1 

3.24 

34.1 

3.00 

34.5 

-643.3 

(pA)2 

5'  — 3' 

C  ^N 

3.10 

28.4 

2.74 

28.2 

3.42 

27.5 

-607.2 

5'  —  3' 
N-*C 

2.91 

24.7 

2.42 

24.3 

3.06 

31.9 

-597.9 

(PA)3 

5'  -►  3' 
C«-N 

3.83 

32.2 

3.05 

26.7 

3.08 

27.0 

-637.4 

(pA>4 

5'  -+■  y 

C  N 

3.9 

33.5 

3.07 

26.7 

2.99 

27.2 

-655.5 

a  The  energy  value  is  given  per  double-stranded  decamer. 
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mol  for  inserts  in  the  (dA)i0  chain,  or  less — about  -18  kcal/mol  for  inserts  in  the 
(dT)10  chain. 

There  is  no  twisting  of  the  helices  possessing  short  PNA  inserts.  Thus,  the  helical 
twist  in  the  3 '-end  and  the  center  of  the  insert  is  about  34°  and  slightly  rises  to  the 
5 -end  of  the  (dT)i0  chain.  Even  more  un wounded  helices  are  stabilized  due  to  the 
inserts  in  the  (dA)10  chain.  There  is  unhomogeneity  of  helices,  possessed  inserts, 
and  it  is  more  pronounced  if  these  inserts  are  in  the  (dT)i0  chain.  But  this  unhom- 
ogeneousness  caused  only  weak  distortions  of  the  helical  axis,  as  it  is  comparable 
with  unhomogeneousness  due  to  sequence  changing.  The  most  stable  structures  of 
decamers  with  4  monomer  unit  inserts  in  (dA)10  and  (dT)i0  chains  are  presented 
in  Figure  7. 

(c)  Triple-Stranded  Complexes  of  PNA 

We  classify  triple-stranded  helical  complexes  of  PNA  on  the  basis  of  mutual 
orientation  of  the  5'  -►  3'  and  N  -►  C  vectors  of  the  Watson-Crick  chains.  If  this 
orientation  is  parallel,  we  will  call  them  parallel.  In  the  reverse  case,  we  also  call 
them  antiparallel.  Each  of  these  types  of  triplexes  contain  two  subtypes,  depending 
on  mutual  orientation  of  the  Hoogsteen  pair  of  chains. 

Conformational  and  helical  parameters  of  the  pentamers  (dA:pT  •  pT)5  with  the 
lowest  intramolecular  potential  energy  for  each  of  four  types  of  triplexes,  mentioned 
above,  are  presented  in  Table  III.  For  a  more  convenient  comparison,  we  present 
the  results  of  the  optimization  of  pentadeoxynucleotides  (dA:dT  •  dT)5  in  the  same 
parametrization  and  optimization  regime  from  two  different  variants  of  starting 
points:  molecular  structure  with  C3'-endo  [12]  and  C2'-endo  [17]  sugars.  The  results 
of  the  computations  show  that  there  is  now  conformational  differences  between 
chimeric  and  nucleotide  triplexes.  C3'-endo-sugar  is  the  preferable  conformation 
in  all  kinds  of  triplexes.  An  exception  is  the  conformation  of  the  completely  parallel 
triplex.  The  most  stable  conformation  of  the  helix  of  this  kind  is  characterized  with 
the  C2'-endo  form  of  the  sugar  ring.  It  should  be  noted  that  the  helical  twist  of  this 
form  is  higher,  and  its  pitch  considerably  lower,  than  in  the  others.  Moreover,  it  is 
an  excellent  illustration  of  the  coupling  of  conformational  parameters  in  helices, 
because  the  glycoside  angle  X  noticeably  rises  in  accordance  with  the  changing  of 
the  sugar  form. 

Conformers  with  the  parallel  type  of  chain  orientation  in  chimeric  triplexes  are 
characterized  with  the  lowest  potential  energy.  The  best  conformer  of  this  kind  is 
presented  in  Figure  8.  This  type  of  triplex  allows  also  the  largest  limits  of  the  al¬ 
teration  of  helical  and  conformational  parameters,  which  is  evidence  of  its  largest 
conformational  entropy.  So,  it  may  be  supposed  that  the  parallel  type  of  triplexes 
is  the  most  probable  in  experimental  conditions.  The  exact  preference  for  this  type 
of  mutual  chain  orientation  in  hybrid  complexes  of  (dA)]0  and  the  C-lyzine,  N- 
acridine  derivative  of  (pT)i0  was  defined  by  Nielsen  et  al.  in  [1].  But  the  stoichi¬ 
ometry  of  the  complexes  was  not  known  definitely  in  this  work.  In  the  next  work 
of  this  group  of  authors  [2],  the  stoichiometry  2  PNA:1  DNA  was  established  by 
u v  -titration  curves. 
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Figure  8.  Conformer  with  parallel  orientations  of  W-C  strands  in  chimeric  triplexes. 

Stereo  view. 


The  intramolecular  potential  energy  in  chimeric  triple-stranded  complexes  is 
considerably  lower  than  in  oligonucleotide  triplexes.  The  energy  gain  is  about  40 
kcal/mol  per  monomer  unit.  So,  it  is  not  surprising  that  oligoamides  interact  with 
DNA,  forming  PNA  triplexes  in  the  process  of  D-loop  formation  [  3-5  ] .  Almarsson 
et  al.  [18]  provided  molecular  mechanics  calculations  of  such  a  complicated  system, 
comprising  two  oligoamide  chains  and  double-stranded  DNA,  forming  a  D-loop. 
They  indicated  two  main  reasons  stabilizing  such  structures:  “van  der  Waals  at¬ 
traction  between  relatively  nonpolar  peptide  strands,  and  stabilising  electrostatic 
effect  arising  from  removal  of  one  phosphodiester  strand.”  Our  calculations  of 
three-stranded  forms  of  PNA,  in  turn,  show  their  intrinsic  stability. 
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Abstract 

1,3-Substituted- 1,4-dihydropyridines  easily  add  water  to  the  5,6-double  bond  in  acid  catalysis  resulting 
in  6-hydroxy- 1,4, 5, 6-tetrahydropyridines.  The  influence  of  various  substituents  in  position  C-5  and  C- 
6  over  the  hydration  of  1 -methyl- 1,4-dihydromethylnicotinate,  used  as  a  model  compound,  was  inves¬ 
tigated  in  the  framework  of  the  am  1  molecular  orbital  approximation.  Since  the  rate-limiting  step  of  the 
reaction  is  a  proton  transfer  from  the  acidic  species  to  the  C-5  position  of  the  substrate,  calculated  proton 
affinities  (pa)  were  used  as  reactivity  indexes.  The  results,  in  agreement  with  experimental  evidences, 
indicated  that  electron-donating  (  +  1)  substituents  increase  the  PA  and  destabilize  1,4-dihydropyridines 
towards  hydration,  while  electron-withdrawing  (-1)  groups  have  the  opposite  effect.  Calculated  vertical 
ionization  potentials  (Ip)  indicate  that  similarly  +1  groups  facilitate  the  one-electron  oxidation,  while 
-I  groups  stabilize  the  molecules  toward  this  reaction.  Several  molecular  properties  derived  from  the 
principle  of  maximum  hardness  were  also  used  for  the  investigation  of  the  stability  of  the  dihydropyridines. 
©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Dihydropyridines  have  an  important  role  in  numerous  vital  biological  processes 
being  constituents  of  the  ubiquitous  NADH  «-►  NAD+  coenzyme  system;  accordingly 
the  chemistry  of  these  compounds  has  been  extensively  investigated  [1,2] .  Recently, 
1,4-dihydropyridine  <-*•  pyridinium  quaternary  salt  moieties  have  been  used  in  redox 
targetor  based  chemical  delivery  systems  (CDSs)  to  transport  drugs  specifically  to 
the  central  nervous  system  [  3-5  ] .  While  of  remarkable  efficiency  in  intravenous 
formulations,  cdss  are  less  suitable  for  oral  administration  due  to  the  instability  of 
the  1 ,4-dihydropyridines  in  the  harsh  acidic  condition  of  the  stomach.  At  low  pH 
these  derivatives  easily  add  water  across  the  5,6-double  bond  resulting  in  6-hydroxy- 
1,4, 5, 6-tetrahydropyridines  [6,7].  This  unwanted  transformation  represents  a 
practical  inconvenience,  since  the  product  of  hydration,  both  reduces  the  lipo- 
philicity  of  the  CDS  and  prohibits  the  in  vivo  oxidation  to  the  quaternary  pyridinium 
salt,  two  major  requirements  of  this  approach.  Modalities  to  circumvent  this  problem 
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have  been  recently  explored  [8] .  The  possibility  to  increase  the  stability  of  the  5,6- 
double  bond  toward  the  acid  catalyzed  hydration  by  the  structural  manipulation 
of  the  dihydropyridine  moiety  is  examined  theoretically  herein.  The  stability  of 
dihydropyridines  toward  the  chemical  oxidation  to  pyridinium  salts  is  also  inves¬ 
tigated. 

1  -Methyl- 1 ,4-dihydromethylnicotinate  (I,  Fig.  1 )  was  selected  as  a  model  com¬ 
pound  for  the  study.  The  influence  of  substituents  at  C-5  and  C-6  positions  (Rx 
and  R-2i  respectively)  on  the  reactivity  of  I  towards  acid  catalyzed  hydration  and 
oxidation  reactions  was  examined. 


R-j  and  R2:  see  Table  I 


Figure  1 .  The  mechanism  of  acid  catalyzed  hydration  of  1 ,4-dihydropyridines. 
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The  theoretical  study  was  performed  in  the  framework  of  the  AMI  molecular 
orbital  approximation  [9],  based  on  its  well  documented  ability  to  accurately  predict 
heats  of  formation  and  molecular  geometries.  No  solvent  elfects  were  considered 
in  this  study. 


Methods 

Theoretical  studies  were  performed  using  the  AMI  molecular  orbital  method 
[9,10]  included  in  the  MOPAC  (version  5.10)  package.  A  Tektronix  Computer 
Aided  Chemistry  (CAChe™)  Worksystem  run  on  an  Apple  Macintosh™  II  com¬ 
puter  was  used  for  all  computations.  The  structural  input  was  generated  using  a 
Macintosh  interface  and  all  starting  geometries  were  found  by  using  molecular 
modeling  (mm2)  to  optimize  the  geometries.  The  Broyden-Fletcher-Goldfarb- 
Shanno  method  [11-14]  was  used  to  optimize  geometries  as  a  function  of  the  total 
molecular  energy.  All  geometric  variables  were  optimized.  The  dynamic  “level- 
shift”  method  [15]  was  used  to  improve  the  convergence  of  the  self-consistent  field 
(SCF).  The  “precise”  option  was  implemented  for  tightening  the  convergence  criteria 
for  all  optimizations.  The  closed-  and  open-shell  species  were  investigated  using 
the  restricted  Hartree-Fock  (rhf)  approach. 

Results  and  Discussion 

The  mechanism  of  the  acid  catalyzed  hydration  of  the  1 ,4-dihydropyridines 
[16,17]  is  illustrated  in  Figure  1,  for  the  case  of  1 -methyl- 1,4-dihydromethylnico- 
tinate  (I).  In  the  first,  rate  determining  step  of  the  reaction,  a  proton  transfer  from 
the  acid  species  AH  to  the  C-5  position  [the  ^-position  of  the  enamine  system 
N(  1  )-C(6)-C(5)]  of  the  substrate  occurs.  The  most  probable  transition  state  of 
this  step  is  II ;  the  proton  then  forms  a  bond  with  C-5  by  converting  the  7r  pair  of 
electrons  in  a  a  pair  as  in  any  electrophilic  addition.  The  resulting  intermediate  II 
has  a  positive  charge  on  C-6  which  can  be  accommodated  by  the  N-l  atom,  in  the 
canonical  form  IV.  The  combination  of  the  highly  reactive  intermediates  III  <-► 
IV  with  a  species  carrying  an  electron  pair,  such  as  OH  “  in  the  case  of  hydration 
is  then  very  rapid,  the  final  product  being  the  6-hydroxy- 1,4, 5, 6-tetrahydro  derivative 
V.  Obviously,  other  nucleophiles  if  present  in  the  reaction  media  can  also  attack 
the  positive  ion. 

The  influence  of  several  electron-donating  (  +  1)  (methyl)  and  electron-  with¬ 
drawing  (-1)  (halogen)  substituents  at  C-5  (R{)  and/or  C-6  (R2)  over  the  hydration 
reaction  was  investigated.  It  is  known  [18]  that  generally  electron  withdrawing 
groups  in  a  reduce  the  reactivity  of  double  bonds  presumably  by  decreasing  the 
electronic  density,  electron  donating  groups  having  the  opposite  effects. 

However,  a  better  explanation  of  the  phenomenon  can  be  given  by  using  a  ther¬ 
modynamic  criterion.  Since  formation  of  the  intermediate  carbanion  III  <-►  IV  is 
the  rate  limiting  step  of  the  reaction,  it  is  reasonable  to  consider  calculated  proton 
affinities  (pa)  of  I  as  reactivity  indexes.  PA  of  a  compound  B  to  form  the  conjugate 
acid  HB+  [19]  can  be  determined  by  using  the  equation: 
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PA (B)  =  AH f( H+)  +  A Hf(B)  -  A Hf(HB+) 
which  becomes  for  the  case  described  in  Figure  1 : 

PA(I)  =  AH{(H+)  +  A//f(I)  -  AH f  (III) 

where  AH/  are  calculated  heats  of  formation  for  I  and  III  (III  and  IV,  being  ca¬ 
nonical  forms  have  the  same  AH/).  Since  AMl  gives  a  very  poor  estimate  of  the 
heat  of  formation  of  H+  (calculated:  314.9;  observed  367.2  kcal/mol  [20]),  the 
experimental  value  was  used  in  calculating  pa  . 

The  heats  of  formation  for  I,  III,  and  V  as  well  as  the  pa  of  I,  for  various  Rx 
and  R2  are  presented  in  Table  I.  The  results  indicate  that  pa  is  indeed  influenced 
by  the  nature  and  position  of  the  substituents.  There  is  a  14.5  kcal/mol  difference 
between  the  lowest  and  highest  calculated  pa.  Electron- withdrawing  substituents 
at  C-5,  C-6,  or  both  positions  decrease  the  PA  and  stabilize  I  toward  hydration.  A 
decrease  of  the  pa  with  the  decrease  of  the  electronegativity  (relative  to  H)  was 
registered  for  the  R2  substituents  (at  C-6)  (order  of  pa:  F  >  Cl  >  Br  >  I).  In  the 
case  of  the  R{  substituents  (at  C-5)  pa  increased  with  the  decrease  of  the  electro¬ 
negativity  (order:  I  >  Br  >  Cl  >  F).  When  both  C-5  and  C-6  protons  bear  an 
electronegative  substituent  (compounds  1-10  and  1-11)  pa  had  the  lowest  value. 
Based  on  these  considerations  the  most  stable  combinations  toward  hydration  are, 
in  order  1-10  ( 5,6-difluoro ) ,  1-6,  and  1-11  (5-fluoro  and  5,6-dichloro,  respectively). 
There  is  a  10  kcal/mol  difference  between  the  PA  of  1-1  (the  unsubstituted  com¬ 
pound)  and  that  of  1-10  (the  5,6-difluoro  derivative).  Electron-donor  substituents 
(methyl)  at  C-6  (1-12)  or  at  both  C-5  and  C-6  (1-14)  increase  the  PA  and  conse¬ 
quently  the  hydration  reactivity  (4  kcal/mol  difference),  while  the  substituent  at 


Table  I.  Calculated  AMl  heats  of  formation  (A HJ)  and  proton  affinities  (pa)  (kcal/mol). 


Compound 

Substituents 

Ri  Ri 

i 

A  Hf 

(kcal/mol) 

III 

y 

PA 

(kcal/mol) 

I 

1 

H 

H 

-54.1 

100.7 

-121.5 

212.5 

2 

H 

F 

-95.0 

62.7 

-169.8 

209.5 

3 

H 

Cl 

-59.1 

98.7 

-124.4 

209.4 

4 

H 

Br 

-46.8 

112.4 

-106.4 

208.0 

5 

H 

I 

-35.4 

124.6 

-95.0 

207.2 

6 

F 

H 

-98.9 

64.6 

-167.9 

203.7 

7 

Cl 

H 

-61.1 

101.2 

-127.9 

204.9 

8 

Br 

H 

-49.6 

113.2 

-114.2 

204.5 

9 

I 

H 

-38.3 

123.6 

-101.6 

205.3 

10 

F 

F 

-138.0 

-27.3 

-211.7 

202.0 

11 

Cl 

Cl 

-64.5 

99.0 

-126.3 

203.7 

12 

H 

ch3 

-60.1 

90.6 

-121.3 

216.5 

13 

ch3 

H 

-62.1 

95.0 

-129.0 

210.2 

14 

ch3 

ch3 

-65.3 

85.8 

-123.9 

216.5 

STABILITY  OF  1,3-SUBSTITUTED  1 ,4-DIHYDROPYRIDINES 


177 


C-6  (1-13)  decreases  somewhat  the  pa  (with  2.3  kcal/mol).  All  these  findings  are 
in  agreement  with  available  experimental  evidences  [16]. 

Another  important  transformation  of  the  1 ,4-dihydropyridines  is  the  oxidation 
to  the  quaternary  salts  (VIII,  Fig.  2).  While  the  in  vivo  oxidation  is  a  requirement 
of  the  CDS  approach,  a  fast  chemical  oxidation  can  be  disadvantageous  for  the  drug 
(handling,  storage,  and  formulation  are  difficult).  The  chemical  oxidation  was  well 
investigated  in  a  series  of  recent  studies  [21-23]. 

It  was  shown,  for  example  [22],  that  observed  second  order  rate  constants  of 
ferricyanide-mediated  oxidations  well  correlated  (r  =  0.96)  with  am  1 -derived  ver¬ 
tical  ionization  potentials;  these  results  were  consistent  with  a  mechanism  of  oxi¬ 
dation  involving  an  initial  rate-determining  ( or  partially  rate-determining)  electron 
loss  from  the  heterocycle,  followed  by  sequential  proton  and  electron  transfers  (Fig. 
2).  Calculated  vertical  ionization  potentials  (Ip)  for  various  1(1-14)  are  presented 
in  Table  II.  The  results  indicate  that  electron  withdrawing  groups  (—1)  increase  Ip 
and  stabilize  dihydropyridines  toward  oxidation,  while  electron  donating  groups 
(  +  1)  decrease  the  Ip.  These  data  are  in  agreement  with  experimental  findings  [24] 
which  indicate  that  +  1  groups  (methyl)  destabilize  dihydropyridines,  facilitating 
their  oxidation  (by  single  electron  transfer).  However,  differences  in  calculated 
Ip  are  smaller  (maximum  0.425  kcal/mol)  as  compared  to  differences  in  calcu¬ 
lated  PAS. 

Several  molecular  properties  derived  from  the  principle  of  maximum  hardness 
(pmh)  [25,26]  were  applied  to  the  dihydropyridines  I  in  order  to  discuss  their 
reactivity.  The  principle  of  maximum  hardness  is  a  result  of  the  application  of  the 


Figure  2.  Oxidation  of  1 ,4-dihydropyridines. 
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Table  II.  Calculated  vertical  ionization  potentials  (lp) 
for  I. 


Substituents 

Compound 

Ri 

lp  (ev) 

1 

H 

H 

8.172 

2 

H 

F 

8.436 

3 

H 

Cl 

8.384 

4 

H 

Br 

8.396 

5 

H 

I 

8.384 

6 

F 

H 

8.303 

7 

Cl 

H 

8.323 

8 

Br 

H 

8.390 

9 

I 

H 

8.422 

10 

F 

F 

8.546 

11 

Cl 

Cl 

8.485 

12 

H 

ch3 

8.100 

13 

ch3 

H 

8.074 

14 

ch3 

ch3 

8.011 

density  functional  theory  (dft)  to  chemistry.  Hardness  (77)  is  the  HOMO-LUMO 
energy  gap  and  can  be  defined  by  the  equations: 

7]  =  (I  —  A)/ 2  =  (£lumo  —  £homo)/2 

where  I  is  the  ionization  potential,  A  the  electron  affinity,  £lumo  and  £Homo  the 
energies  of  the  lowest  unoccupied  and  highest  occupied  molecular  orbitals,  respec¬ 
tively.  The  maximum  hardness  principle  asserts  that  systems  tend  to  be  as  hard  as 
possible;  thus  a  hard  molecule  has  a  large  energy  gap,  and  a  soft  molecule  has  a 
small  gap.  Bigger  77  means  larger  I  and  smaller  A  which  implies  that  the  system  has 
a  lower  tendency  to  accept  and  to  give  away  particles;  that  is  the  system  is  stable. 
Soft  molecules  (softness  is  defined  by  a  =  1  / 77)  are  more  reactive  than  hard  mol¬ 
ecules.  Another  molecular  property  derived  from  dft  is  the  absolute  electronega¬ 
tivity  (X)  defined  by  the  equation: 

X  =  -ft  =  (I  +  A)/2 

(fi  being  the  electronic  chemical  potential).  X  is  similar  but  not  equal  to  the  Mulliken 
electronegativity. 

Calculated  77,  a ,  and  X  are  collected  in  Table  III.  The  results  indicate  that  there 
are  no  significant  differences  in  hardness  and  softness  for  various  substituted  di- 
hydropyridines,  which  means  that  the  reactivity  of  the  dihydropyridines  11-14  is 
not  much  different.  Small  differences  in  hardness  and  softness  for  isomers  and 
closely  related  compounds  are  not  unexpected,  however  [26].  The  results  suggest 
that  the  proton  addition  is  more  dependent  on  the  thermodynamic  stability  of  the 
intermediate  cation  III  than  on  the  reactivity  of  I.  Calculated  electronegativity  (X) 
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Table  III.  Calculated  highest  occupied  orbital  (homo),  lowest  unoccupied  molecular  orbital 
(LUMO)  energies  (£),  hardness  (77),  softness  (cr),  and  absolute  electronegativity  (X)  (eV)  for  I. 


Compounds 

-Ehomo 

•Elumo 

V 

<r 

X 

1 

-8.172 

0.182 

AMI 

0.239 

3.995 

2 

-8.436 

-0.067 

4.185 

0.239 

4.252 

3 

-8.384 

-0.038 

4.173 

0.240 

4.211 

4 

-8.396 

-0.058 

4.169 

0.240 

4.227 

5 

-8.384 

-0.176 

4.104 

0.244 

4.280 

6 

-8.303 

-0.079 

4.112 

0.243 

4.191 

7 

-8.323 

-0.065 

4.129 

0.242 

4.194 

8 

-8.390 

-0.119 

4.136 

0.242 

4.255 

9 

-8.422 

-0.117 

4.153 

0.241 

4.270 

10 

-8.546 

-0.332 

4.107 

0.244 

4.439 

11 

-8.485 

-0.262 

4.112 

0.243 

4.374 

12 

-8.100 

0.216 

4.158 

0.241 

3.942 

13 

-8.074 

0.186 

4.130 

0.242 

3.944 

14 

-8.011 

0.226 

4.119 

0.243 

3.893 

can  be  correlated  to  the  rate  of  oxidation.  A  decreasing  electronegativity  indicates 
an  increase  of  the  rate  of  oxidation  since  lower  electronegativity  reflects  a  greater 
tendency  to  lose  electrons.  Data  presented  in  Table  III  indicate,  that  +1  groups 
decrease  X,  increasing  the  rate  of  oxidation,  while  —I  groups  have  the  opposite 
effect.  These  data  are  in  agreement  with  the  previous  findings  based  on  calculated 
ionization  potentials. 

In  summary,  the  AMl  study  indicates  that  electron- withdrawing  groups  at  C-5 
and  C-6  positions  stabilize  the  1,4-dihydropyridine  moieties  toward  both  acid  cat¬ 
alyzed  hydration  and  oxidation  reactions.  These  results  could  have  practical  ap¬ 
plications. 
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Abstract 

Our  semiempirical  method  of  calculating  log  P ,  which  uses  a  linear  combination  of  various  combinations 
of  descriptors  based  on  the  ami -optimized  geometry  of  the  molecule,  has  been  applied  to  a  set  of  38 
substituted  phenols  and  has  been  shown  to  be  successful  in  providing  a  better  quantitative  structure- 
activity  relationship  (qsar)  than  that  of  the  Hansch-type  approach  in  a  study  of  the  inhibitory  activity 
of  substituted  phenols  on  Bacillus  subtilis  spore  germination.  This  model  shows  that  the  calculated 
partition  coefficient,  the  geometrical  descriptors,  and  electronic  effects  are  the  major  factors  determining 
the  biological  activity.  ©  1994  John  Wiley  &  Sons,  Inc. 

Introduction 

The  goal  of  workers  in  the  area  of  quantitative  structure-activity  relationships 
(QSAR)  has  been  the  development  of  quantitative  methods  of  determining  the  ac¬ 
tivities  of  a  series  of  compounds.  There  have  been  numerous  mathematical  attempts 
to  correlate  molecular  structure  with  drug  activity.  A  significant  aspect  of  these 
kind  of  studies  is  the  often-found  relationship  between  biological  properties  of  mol¬ 
ecules  and  their  partition  coefficient  (log  P). 

Yasuda  et  al.  [1]  correlated  directly  the  log  P  values  of  substituted  phenols  to 
their  inhibitory  activity  (log  1  //50)  on  Bacillus  subtilis  spore  germination.  The  log 
P  values  used  were  combinations  of  the  experimental  data  and  calculated  results 
obtained  by  the  Hansch  fragment  method.  Klopman  et  al.  [2]  correlated  this  in¬ 
hibitory  activity  with  functional  group  descriptors  in  the  same  way  that  they  de¬ 
veloped  their  log  P  model.  In  this  study,  we  used  the  calculated  log  P  values  from 
our  BLOGP  program  [  3  ] .  By  applying  the  same  semiempirical  method  as  before 
[4-8  ]  to  the  entire  set  of  38  substituted  phenols,  we  found  that  this  method  provides 
a  better  quantitative  structure-activity  relationship  than  that  of  the  Hansch-type 
approach  and  also  uses  fewer  parameters  than  does  Kiopman’s  approach  in  the 
study  of  inhibitory  activity  of  substituted  phenols  on  B.  subtilis  spore  germination. 

Methods 

The  entire  set  of  38  compounds  were  calculated  by  the  Tektronix  CAChe  (Com¬ 
puter  Assisted  Chemistry )  workstation.  For  each  compound,  mm2  [  9  ]  is  the  starting 
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point  for  the  am  1  [10]  calculation.  In  all  cases,  the  default  (Broyden-Fletcher- 
Goldfarb-Shanno)  geometry  search  method  was  employed  to  yield  full  geometry 
optimization.  The  default  SHIFT  -  15  eV  option  was  used  to  allow  15  eV  of 
damping  of  SCF  iterations  to  be  determined  by  the  rate  of  convergence,  and  the 
PRECISE  option  was  used  to  strengthen  the  convergence  criteria  for  the  SCF  pro¬ 
cedure  and  for  the  geometry  optimization.  Based  on  the  am  1 -optimized  geometry 
and  the  van  der  Waals  radii  of  each  atom,  the  molecular  surface,  volume,  and 
ovality  were  calculated  the  same  way  as  before  [4-6] . 

The  inhibitory  effect  of  various  concentrations  of  phenols  on  the  germination 
rate  in  0.1  m M  L-alanine  at  pH  7.2  was  studied  by  Yasuda  et  al.  [1].  The  molar 
concentration  of  a  phenol  necessary  to  cause  50%  inhibition  of  the  germination 
rate  (/50)  was  determined  by  them  also.  Each  I50  value  represents  the  mean  of  three 
determinations. 

The  log  P  value  for  each  molecule  was  calculated  by  our  BLOGP  program  on  a 
microvax  computer,  and  linear  combinations  of  the  calculated  descriptors  were 
fitted  to  the  observed  inhibitory  activity  (log  1  //50)  by  using  a  regression  program 
developed  for  the  VAX  computer.  As  mentioned  by  Klopman  et  al.  [2  ] ,  the  general 
form  of  the  qsar  is 

log10(act.)  -  a  +  6(log10  P)  +  c( other  descriptors)  (1) 

and 

logio  P  =  2  n,ph  (2) 

/ 

where  the  P,  correspond  to  the  different  molecular  properties  that  we  have  developed 
for  log  P.  Those  “other  descriptors”  are  descriptors  of  which  some  may  contribute 
to  log  P,  but  may  have  an  additional  contribution  to  activity. 

Results  and  Discussion 

Instead  of  using  the  combination  of  experimental  data  and  the  calculated  results 
from  the  Hansch  fragment  method  for  the  log  P,  we  used  the  BLOGP  program  on 
our  VAX  computer  to  calculate  log  P.  The  calculated  log  P  value  for  each  compound 
is  listed  in  Table  I.  We  first  attempted  to  correlate  the  log10  1  /Iso  activity  with  the 
partition  coefficient  alone,  which  resulted  in  quite  a  good  correlation: 

logio  l//5o  =  1.531  +  0.6725  log  P  (3) 

n  =  38,  r  —  0.825,  s.d.  =  0.426,  F  =  76.582. 

The  regression  analysis  for  38  substituted  phenols  is  described  in  Eq.  (3),  where 
n  is  the  number  of  compounds  submitted  to  the  regression;  r,  the  correlation  coef¬ 
ficient;  s.d.,  the  standard  deviation;  and  P,  the  overall  statistical  significance  of  the 
equation.  Our  calculated  log  P  alone  provides  a  better  structure-activity  relationship 
than  that  of  Yasuda’s  study.  Overall,  our  calculated  partition  coefficients  are  quite 
close  to  the  set  that  has  been  used  by  Yasuda  et  al.  [1],  except  for  the  2,3,4,5,6-Cl5 
phenol  and  hexachlorophene.  The  experimental  log  P  data  for  2,3,4,5,6-Cl5  phenol 
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Table  I.  Inhibitory  activity  of  substituted  phenols  on  B.  subtilis  spore  germination. 


No. 

Phenol  substituent 

log  I//50 
(exp) 

log  I//50 
(ours) 

logP 

(calcd) 

Ovality 

HARD 

(eV) 

1 

2-CH3 

2.89 

2.98 

1.702 

1.276 

9.367 

2 

2-C2H5 

3.31 

3.18 

2.101 

1.321 

9.363 

3 

2-CH(CH3)2 

3.57 

3.39 

2.484 

1.362 

9.386 

4 

2-CH2CH(CH3)2 

3.62 

3.54 

2.860 

1.410 

9.374 

5 

2-C(CH3)3 

4.46 

3.69 

2.855 

1.379 

9.399 

6 

2-C1 

3.25 

3.16 

1.829 

1.251 

9.289 

7 

2-N02 

2.50 

2.74 

1.903 

1.273 

8.726 

8 

2-C6H5 

3.55 

3.55 

3.152 

1.382 

8.813 

9 

2-OCH3 

2.20 

2.40 

1.353 

1.303 

9.174 

10 

2-CHO 

2.42 

2.32 

1.278 

1.264 

8.924 

11 

2-COOCH3 

3.00 

2.41 

1.715 

1.348 

8.961 

12 

2-NH2 

1.52 

1.32 

0.188 

1.273 

9.048 

13 

4-CH3 

2.44 

2.88 

1.705 

1.293 

9.307 

14 

4-C2H5 

2.80 

3.07 

2.121 

1.341 

9.298 

15 

4-CH(CH3)2 

3.17 

3.32 

2.555 

1.390 

9.363 

16 

4-CH2CH(CH3)2 

3.52 

3.49 

3.002 

1.449 

9.333 

17 

4-C(CH3)3 

3.52 

3.57 

2.889 

1.408 

9.365 

18 

4-C1 

3.10 

2.99 

1.778 

1.268 

9.219 

19 

4-N02 

2.17 

2.66 

1.723 

1.299 

9.006 

20 

4-C6H5 

3.89 

3.43 

3.194 

1.386 

8.610 

21 

4-CHO 

1.70 

2.19 

1.171 

1.290 

9.046 

22 

4-COOCH3 

2.70 

2.36 

1.616 

1.366 

9.139 

23 

4-H 

2.82 

2.92 

1.300 

1.224 

9.512 

24 

3-CH3 

2.70 

2.93 

1.682 

1.292 

9.410 

25 

2-CH(CH3)2,  5-CH3 

3.31 

3.49 

2.917 

1.419 

9.272 

26 

2,4,6-(CH3)3 

2.85 

3.20 

2.597 

1.390 

9.122 

27 

2,4-Cl2 

3.62 

3.29 

2.346 

1.294 

9.026 

28 

2,4,5-Cl3 

3.89 

3.44 

2.773 

1.325 

8.833 

29 

2,4,6-Cl3 

3.89 

3.49 

2.802 

1.327 

8.889 

30 

2,3,4,6-Cl4 

3.57 

3.75 

3.276 

1.344 

8.702 

31 

2,3,4,5,6-Cls 

3.46 

3.97 

3.639 

1.357 

8.596 

32 

( Hexachlorophene) 

3.70 

3.95 

4.539 

1.539 

8.423 

33 

3-N02 

3.05 

2.49 

1.703 

1.300 

8.799 

35 

2,5-(N02)2 

1.48 

1.72 

1.418 

1.343 

8.355 

35 

3-OH 

2.42 

2.24 

0.851 

1.259 

9.370 

36 

3-OH,  5-CH3 

1.77 

2.33 

1.268 

1.325 

9.314 

37 

3-NH2 

1.35 

1.18 

0.046 

1.275 

9.053 

38 

4-Br 

3.14 

3.32 

2.150 

1.273 

9.211 

listed  by  Hansch  and  Leo  [1 1]  are  5.01,  5.12,  5.86,  and  3.81.  Our  calculated  log  P 
for  2,3,4,5,6-Cl5  phenol  is  3.639,  which  is  quite  close  to  one  of  the  experimental 
results.  The  experimental  data  for  hexachlorophene  listed  by  Hansch  and  Leo  [11] 
are  2.62  and  7.54.  The  log  P  of  2.62  is  at  pH  12.5,  and  the  compound  forms  a  salt; 
the  log  P  value  of  7.54  is  a  calculated  value  and  is  based  on  the  assumption  of  only 
mono  ion  partitions  at  pH  12.5,  so  neither  of  them  is  satisfactory  for  our  purposes. 
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exp.  logl/150 

Figure  1 .  The  experimental  values  plotted  against  our  calculated  values. 


The  second  most  important  parameter  that  correlated  with  the  activity  is  the 
electronic  descriptor  HARD.  The  results  are  shown  in  Eq.  (4): 

log10  1  /ho  =  -4.9975  +  0.7388  log  P  +  0.7029  HARD  (4) 

n  =  38,  r  =  0.867,  s.d.  =  0.375,  F  =  52.920, 

where  HARD  is  twice  the  absolute  hardness  of  the  molecule,  as  discussed  by  Pearson 
[12]:  The  absolute  hardness  is  one-half  of  the  calculated  energy  difference  between 
the  LUMO,  the  lowest  unoccupied  molecular  orbital,  and  homo,  the  highest  occupied 
molecular  orbital,  in  units  of  eV.  We  know  that  soft  molecules,  with  a  small  LUMO- 
HOMO  gap,  will  be  more  polarizable  than  will  hard  molecules,  with  a  large  lumo- 
HOMO  gap.  This  suggests  that  electronic  interactions,  or  removal  of  one  electron 
from  the  oxygen  in  the  phenol  group  or  hydrogen  bonds  to  a  receptor  from  the 
phenolic  group,  play  a  significant  role  in  the  interaction  of  the  receptor-ligand 
molecule. 

The  best  relationship  among  the  activity  of  B.  subtilis  spore  germination  and  the 
hydrophobicity,  the  geometrical  descriptors  of  the  substituted  phenols,  and  the 
electronic  effects  is  given  by  Eq.  (5): 
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log 10  1/750  =  0.1342  +  0.9631  log  P  -  4.21430  +  0.7036  HARD  (5) 

n  =  38,  r  -  0.897,  s.d.  =  0.333,  F  =  46.710, 

where  O  is  the  ovality  of  the  molecule  and  has  been  described  in  previous  papers 
[4-6]  and  is  dimensionless.  Our  standard  deviation  for  this  study,  0.333,  is  better 
than  that  of  Klopman  et  al.  [2],  0.43,  by  using  three  parameters  instead  of  seven. 
From  Eq.  (5),  we  see  that  the  partition  coefficient  and  the  electronic  descriptor, 
HARD,  have  positive  influences  on  the  logj0  I//50,  and  the  geometric  descriptor, 
ovality,  has  a  negative  influence  on  the  logj0  1  /  /50. 

Both  the  calculated  and  the  experimental  values  of  log10  l//50  from  Eq.  (5)  are 
shown  in  Table  I.  It  is  fitted  very  well  in  all  38  substituted  phenols.  The  2-C(CH3)3 
phenol  has  the  largest  residual  of  the  38  phenol  substituents,  0.77,  and  it  also  shows 
a  significant  large  residual  in  the  study  by  Klopman  et  al.  [2] ,  0.65,  and  by  Yasuda 
et  al.  [1],  1.19.  One  may  be  quite  suspicious  about  the  experimental  data  of  this 
2-C(CH3)3  phenol  because  isobutyl  phenol  and  r-butyl  phenol  have  identical  in¬ 
hibitory  activities,  but  for  2-CH2CH(CH3)2  and  2-C(CH3)3  substituents,  the  two 
isomers  have  0.84  differences  in  the  logj0  1  /750  value. 

In  conclusion,  we  find  that  by  using  our  semiempirical  approach  to  this  QSAR 
study  our  results  are  better  than  those  of  previous  studies.  Figure  1  shows  the  ex¬ 
perimental  values  plotted  against  our  calculated  values. 
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Abstract 

The  electronic  structure  of  the  low-temperature  phosphorescencent  probe  6-thioguanine  was  investigated. 
Structural  transitions  in  some  alcohols  (ethanol,  glycerol,  propanediol)  were  obtained  using  this  probe 
in  the  range  4.2-273  K.  Aqueous  solutions  of  native  and  denatured  DNA  and  those  of  native  DNA  with 
propanediol  and  DMSO  added  were  studied  in  the  range  of  4.2-273  K.  The  analysis  of  the  luminescence 
spectra  of  DNA  solutions  permits  the  assumption  of  the  energy  transfer  to  the  probe  on  uv  radiation  of 
DNA.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

Studies  of  low-temperature  structural  transitions  in  DNA  solutions  are  of  great 
interest  to  biophysics.  To  study  such  transitions,  we  investigated  the  structural 
changes  in  aqueous  solutions  of  DNA  in  the  absence  and  presence  of  alcohols.  The 
studies  were  carried  out  in  the  range  from  4.2  to  273  K  using  the  phosphorescent 
probe  method. 

Earlier,  the  method  of  luminescence  probes  was  tested  on  studying  proteins  and 
membranes  [1,2] .  In  the  present  work,  6-thioguanine  (6SG)  or  its  nucleoside  were 
used  as  phosphorescent  probes.  Their  electronic  structures  were  studied  by  us  in 
detail  earlier  [  3-5  ] .  It  is  known  that  6SG  is  an  antitumor  drug  used  for  leicoses 
treatment  [6]. 


Experiment 

6-Thioguanine  (6SG)  and  6-thioguanosine  (6SGR)  were  synthesized  at  the  In¬ 
stitute  of  Organic  Synthesis,  Latvian  Academy  of  Sciences.  ( 6SGR  is  a  fixed  N9H 
tautomer  of  6SG.)  6-Thiopurine-riboside  was  purchased  from  the  Serva  Co.  The 


International  Journal  of  Quantum  Chemistry:  Quantum  Biology  Symposium  21,  187-194  (1994) 

©  1994  John  Wiley  &  Sons,  Inc.  CCC  0360-8832/94/010187-08 


188 


RUBIN  ET  AL. 


alcohols  (ethanol,  glycerol,  propanediol  [PD])  and  dimethyl  sulfoxide  (DMSO) 
were  preliminary  distilled  and  dried.  DNA  from  E.  coli  was  used,  into  which  6SG 
was  introduced  by  the  biosynthesis  method  [7].  The  concentrations  were  10 ~5  M 
6SGR  in  alcohols,  and  10-3  M  DNA  in  solution.  The  relative  6SG  concentration 
in  DNA  was  1  nucleotide  with  6SG  per  3000  or  600  major  nucleotides.  The  DNA 
solution  contained  0.01 5  Af  NaCl  and  0.00 15M sodium  citrate.  To  obtain  denatured 
DNA,  its  initial  solution  was  kept  in  the  boiling  water  bath  for  30  min  and  then 
cooled  down  to  4°C. 

DNA  films  colored  with  6-thiopurine-riboside  were  also  studied.  The  spectral 
features  of  the  latter  are  close  to  those  of  6SG  [  8  ] .  The  preliminary  dried  film  was 
kept  in  an  atmosphere  of  KC1  aqueous  solution  of  the  required  concentration, 
which  provided  the  necessary  humidity  ( 80% ) . 

To  prepare  experimental  samples,  a  special  cuvette  (volume  0.3  mL)  with  the 
prepared  solution  was  quickly  dipped  into  liquid  nitrogen.  Then,  the  sample  was 
cooled  down  to  the  liquid  helium  temperature.  The  rate  of  cooling  was  6-9  K/min 
at  4.2-77  K  and  that  of  heating  was  0.7-1  K/min  in  the  range  4.2-273  K.  The 
temperatures  were  measured  with  a  differential  cuprum-constantan  thermocouple. 

The  phosphorescence  spectra  and  temperature  dependence  of  the  probe’s  phos¬ 
phorescence  intensity  were  obtained  using  the  laboratory  luminescence  setup  in¬ 
corporating  a  nitrogen-helium  cryostat  and  a  microcomputer.  The  exciting  light 
wavelength  was  350  nm,  the  observation  wavelength  being  485  nm  in  the  studies 
of  the  temperature  dependencies  of  probe  emission. 

Results  and  Discussion 

Earlier,  we  carried  out  theoretical  (cndo/s  methods)  and  experimental  (ab¬ 
sorption  and  luminescence  spectroscopy,  circular  dichroism)  studies  on  the  elec¬ 
tronic  structure  of  6SG  [  3-5  ] .  The  results  showed,  in  particular,  that  uv  absorption 
spectra  have  an  intense  long  wavelength  band  ( Fig.  1 )  that  is  due  to  the  intramo¬ 
lecular  transfer  of  an  electron  from  the  sulfur  atom  to  the  pyrimidine  ring.  The 
luminescence  spectrum  shows  essentially  phosphorescence.  The  cndo/s  method 
was  used  to  calculate  the  excited-state  energies  (Table  I)  and  the  elements  of  the 
electronic  structure  (atomic  charges,  bond  orders,  spin  densities)  for  the  ground 
and  excited  states  of  the  neutral  and  ionic  forms  of  6SG.  The  atomic  contribution 
of  the  sulfur  atom  to  the  resonance  integral  was  taken  to  be  14  eV  and  the  single¬ 
center  electron-repulsion  integral  was  put  at  7  eV.  These  parameters  were  derived 
from  the  calculation  for  a  large  group  of  sulfur-containing  compounds  (the  ionic 
and  tautomeric  forms  of  4SU,  2SU,  2.4SU,  and  others)  [5]. 

The  comparison  of  full  atom  charges  of  guanine  and  6SG  shows  their  similarity 
in  the  ground  state.  This  permits  the  N9H  tautomer  of  6SG  to  enter  into  the  DNA. 
The  changes  in  the  full  atom  charge  on  the  transitions  to  the  S^*,  T„*,  and  mr* 
electronic  states  occur  mainly  at  the  C6  and  S10  atoms.  The  largest  change  in  the 
bond  orders  occurs  for  the  C  =  S  bond.  This  suggests  that  excitation  in  the  above 
states  is  localized  at  the  C  =  S  fragment,  which  makes  it  the  most  photoreactive. 
According  to  theoretical  calculations,  the  dipole  moment  of  the  neutral  6SG  mole- 
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Figure  1.  ( 1 )  Absorption  and  (2)  phosphorescence  spectra  of  the  6SG  probe  (N9H  tautomer 
neutral  molecule)  in  ethanol  at  77  K.  Top:  Localization  of  excitation  and  the  excited-state 

diagram. 


cule  decreases  in  the  excited  states:  from  10.80  D  in  the  ground  state  to  6.69  D  in 
the  S**'  state  and  6.67  D  in  the  state. 

The  analysis  of  spectroscopic  and  theoretical  data  permits  the  plotting  of  a  diagram 
of  the  excited  state  of  the  neutral  6SG  molecule  (Fig.  1).  The  diagram  shows,  in 
particular,  that  6SG  can  be  a  suitable  phosphorescence  probe  to  study  structural 
changes  and  transitions  in  DNA  and  low  molecular  compounds  in  a  wide  temper¬ 
ature  interval. 

Using  6SG  incorporated  into  DNA  as  a  phosphorescent  probe,  we  studied  the 
temperature  dependencies  of  the  probe  emission  in  water  DNA  solutions  and  in 
solutions  with  added  PD  and  DMSO  in  a  wide  temperature  interval.  Since  the 


Table  I.  Calculated  (Calc.)  and  experimental  (Exp.)  excited-states  energies  (£)  and  oscillator  strengths 
(/)  of  electronic  transitions  of  different  ionic  and  tautomeric  forms  of  6SG. 


Form  of  molecule 

ESrr. 

(eV) 

/ 

Enr*  (eV) 

Et^ 

(eV) 

Calc. 

Exp. 

Calc. 

Exp. 

Cation 

3.29 

3.56 

0.50 

2.43 

2.12 

2.59 

Neutral  N9H 

3.68 

3.61 

0.47 

3.01 

2.71 

2.70 

Neutral  N7H 

3.67 

3.40 

0.43 

3.19 

2.73 

2.59 

Anion  N9H 

4.00 

3.92 

0.33 

3.73 

3.23 

2.87 

Anion  N7H 

4.09 

— 

0.32 

3.93 

3.26 

2.80 

Dianion 

3.73 

3.87 

0.36 

3.39 

2.93 

2.80 
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temperature  dependencies  of  the  probe  emission  in  DNA  are  complex  in  shape, 
we  studied  the  temperature  dependencies  of  our  probe  emission  in  simple  systems: 
ethanol,  glycerol,  and  PD. 

The  temperature  dependencies  of  the  6SGR  phosphorescence  intensity,  obtained 
on  heating  in  glycerol,  ethanol,  and  PD  (Fig.  2),  showed  that  the  probe  luminescence 
intensity  changes  with  the  changes  in  the  aggregated  states  of  alcohols  [9-12]:  on 
the  devitrification,  formation,  and  melting  of  crystals.  The  strong  luminescence 
decrease  (phosphorescence  quenching)  is  observed  in  the  range  of  100-120  K  in 
ethanol,  in  the  range  of  150-200  K  for  PD  and  its  solutions,  and  in  the  range  of 
180-190  K  for  glycerol.  The  6SGR  luminescence  also  changes  in  ethanol  at  125— 
150  K,  which  correlates  with  the  formation  and  melting  of  ethanol  crystals,  as 
calorimetric  data  suggest  [10,12].  The  comparison  between  the  calorimetric  data 
for  alcohols  [9-12]  and  the  temperature  dependencies  of  the  6SGR  emission  in¬ 
tensity  (Fig.  2)  shows  that  the  strongest  decrease  in  the  luminescence  intensity 
(phosphorescence  quenching)  is  observed  when  the  liquid  phase  appears. 

It  is  also  seen  in  Figure  2  that  in  the  range  from  4.2  K  to  the  temperature  of  the 
phosphorescence  quenching  onset  the  luminescence  intensity  of  solid-state  alcohols 
decreases  1 .5-3  times  in  the  ethanol-PD-glycerol  row.  This  may  be  due  to  the  slow 
processes  of  the  thermoactivation  mobility  of  the  molecule  fragments  or  the  whole 
molecules  of  the  matrix  (glass)  [13]. 

We  also  studied  the  temperature  dependencies  of  the  6SGR  phosphorescence 
intensity  in  the  15%  water  PD  solution  and  in  the  15%  DMSO  solution.  In  these 
experiments,  phosphorescence  quenching  was  observed  in  the  ranges  of  devitrifi¬ 
cation  of  the  amorphous  phases  for  the  solutions  of  the  same  concentration:  in  the 
range  of  165  K  for  PD  and  in  the  range  of  130  K  for  DMSO  [9,14]. 

After  this  preliminary  analysis  of  the  probe  emission  thermograms  in  simple 
systems,  we  could  examine  the  emission  thermograms  of  the  probe  incorporated 
into  DNA.  As  seen  in  Figure  3,  the  temperature  dependencies  of  the  6SG  phos¬ 
phorescence  in  native  DNA  (nDNA)  have  specific  features  at  21,  64,  87,  140,  182, 


°C 

-200  -100  0 


Figure  2.  The  temperature  dependencies  of  the  6SGR  probe  emission  intensity:  (1)  in 
ethanol;  (2)  PD;  (3)  glycerol. 
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Figure  3.  The  temperature  dependencies  of  the  emission  of  6SG  incorporated  in  DNA. 
(1)  Warming  and  (2)  cooling  of  native  DNA;  (3)  warming  of  denaturated  DNA.  Rates  of 
cooling/warming  were  1  K/min. 


and  268  K.  The  amplitude  of  the  effect  observed  at  87  K  decreases  after  the  sample 
annealing.  The  phosphorescence  quenching  of  6SG  in  DNA  is  observed  at  182— 
2 1 5  K.  The  temperature  intervals  of  the  of  6SG  phosphorescence  increase  on  cooling 
of  nDNA  and  of  its  phosphorescence  quenching  on  heating  are  close.  It  is  also  seen 
that  the  interval  of  the  phosphorescence  quenching  of  6SG  in  denatured  DNA  shifts 
toward  lower  temperatures  as  compared  to  that  of  nDNA.  Unlike  the  nDNA  having 
peculiarities  at  64,  87,  140,  182,  and  268  K,  the  denatured  DNA  has  them  at  90, 
140,  and  268  K.  The  slope  of  the  temperature  dependence  of  the  probe  luminescence 
in  DNA  at  21-140  K  is  significantly  smaller  than  that  of  the  corresponding  curves 
taken  in  alcohols  (cf.  Figs.  2  and  3). 

Figure  4  shows  that  the  6SG  phosphorescence  quenching  in  DNA  in  1 5%  PD  is 
observed  at  lower  temperatures  ( 160-180  K).  This  temperature  interval  is  char¬ 
acteristic  of  devitrification  of  PD  and  its  solution  (cf.  Fig.  2).  The  temperature 
dependence  of  6SG  in  DNA  in  the  PD  solutions  manifests  peculiarities  at  9,  75, 
and  160  K.  At  lower  temperature,  the  probe  phosphorescence  quenching  is  observed 
in  the  aqueous  DNA  solution  with  added  DMSO.  The  range  from  120  to  170  K  is 
close  to  the  region  of  devitrification  of  the  aqueous  DMSO  solutions  [14].  The 
preliminary  studies  on  DNA  films  colored  with  6-thiopurine-riboside  (RH  80%) 
show  that  the  probe’s  phosphorescence  quenching  interval  is  shifted  to  the  region 
of  higher  temperatures  ( 190-245  K). 

The  X-ray  diffraction  analysis  of  the  data  for  the  B-form  DNA  structure  at  16 
and  293  K  [15,16]  and  the  data  for  DNA  with  6SG  instead  of  G  [6,7]  allows  the 
conclusion  that  the  C  =  S  fragment  of  the  6SG  molecule,  where  the  excitation  is 
localized,  is  within  the  major  groove  of  DNA  in  aqueous  solutions. 

In  DNA  films,  the  probe  and  its  C=S  fragment  are  located  nearer  to  the  phos¬ 
phate  groups  or  sugars  than  to  the  bases  because  the  phosphates  and  sugars  form 
about  80%  of  the  surface  of  the  DNA  double  helix  [17].  Perhaps  the  above  effect 
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Figure  4.  The  temperature  dependencies  of  the  emission  of  6SG  incorporated  into  DNA 
under  different  conditions:  (1)  water  solution;  (2)  15%  PD  solution;  (3)  15%  DMSO  solution; 
(4)  hydrate  film  of  DNA  (probe  6-thiopurine-riboside  was  not  incorporated  into  DNA, 
but  was  placed  in  the  DNA  hydrate  shell). 


in  DNA  films  explains  the  shift  of  the  interval  of  the  probe  phosphorescence 
quenching  to  the  higher-temperature  region  because  the  binding  energy  of  phos¬ 
phates  to  water  is  known  to  be  higher  than  that  of  the  bases  [17]. 

The  analysis  of  data  on  DNA  calorimetry  [18,19]  and  the  6SGR  phosphorescence 
quenching  (Fig.  2)  in  alcohols  with  an  appearing  liquid  phase  and  the  similarity 
of  the  temperature  intervals  where  the  phosphorescence  intensity  increases  or 
quenches  on  cooling  and  heating  of  nDNA  solutions  allows  the  assumption  that 
the  6SG  phosphorescence  quenching  in  native  DNA  at  180-215  K  (Fig.  3)  is  due 
to  the  releasing  of  the  brakes  of  the  water  molecule  motions  (devitrification)  in  the 
major  groove  of  DNA.  This  is  also  evidenced  by  the  shift  of  the  temperature  interval 
of  the  phosphorescence  quenching  with  PD  and  DMSO  added  to  the  aqueous  DNA 
solutions. 

Perhaps  the  increasing  luminescence  intensity  of  the  aqueous  nDNA  solution  at 
87  K,  which  decreases  on  sample  annealing,  is  due  to  the  sample  cracking.  Earlier, 
we  observed  the  increase  in  the  probe  luminescence  intensity  on  the  cracking  of 
transparent  alcohol  glasses  with  a  probe.  The  factors  causing  these  peculiarities  at 
21,  64,  and  140  K  are  being  studied  now. 

We  also  studied  the  luminescence  spectra  of  water  DNA  solutions  with  different 
concentrations  of  the  probe  in  DNA.  The  DNA  with  the  probe  concentration  of  1 
probe  molecule  per  3000  and  600  major  nucleotides  and  DNA  without  the  probe 
(Fig.  5 )  were  studied.  The  analysis  of  the  luminescence  spectra  of  the  DNA  solution 
at  Xex  =  280  nm  shows  that  a  change  from  DNA  without  the  6SG  probe  to  DNA 
containing  one  6SG  residue  per  3000  and  600  major  nucleotides  leads  to  a  decrease 
in  the  DNA  fluorescence  intensity  at  330-380  nm  and  a  simultaneous  considerable 
increase  in  the  luminescence  at  450-500  nm.  The  emission  intensity  in  the  range 
450-500  nm  appreciably  exceeds  the  expected  values  of  the  6SG  emission  and  the 
6SG  concentration,  and  the  efficiency  of  its  absorption  at  Xex  =  280  nm  is  taken 
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Figure  5.  The  luminescence  spectra  of  DNA  in  water  solution  at  77  K:  (1)  DNA  without 
the  probe;  (2)  DNA  with  the  6SG  probe  (1/3000);  (3)  DNA  with  the  6SG  probe  (1/600); 
(4)  absorption  spectrum  of  the  6SG  probe. 


into  account.  Figure  5  also  shows  that  there  is  good  overlapping  between  the  flu¬ 
orescence  spectrum  of  DNA  and  the  absorption  spectrum  of  6SG.  The  calculation 
of  the  excitation  transfer  Forster  radius  gives  R0  =  28  A.  Both  these  results  permit 
us  to  assume  the  induction-resonance  mechanism  of  the  excitation  transfer  to  6SG 
from  the  DNA  molecule  under  UV  radiation  of  the  DNA,  although  there  is  a 
possibility  of  energy  migration  due  to  the  exchange-resonance  mechanism.  These 
observations  are  of  significant  interest  because  6SG  acts  as  an  antitumor  drug.  Our 
experiments  show  that  the  chemicotherapeutical  action  of  6SG  can  be  modified 
(intensified)  by  additional  uv  irradiation  of  cancer  cells. 

Conclusions 

1 .  6-Thioguanosine  (6SGR)  is  a  suitable  probe  to  study  low  and  high  molecular 
systems. 

2.  Phosphorescence  intensity  curves  for  6SG  in  the  native  DNA  manifest  pe¬ 
culiarities  at  21,  64,  87,  140,  182,  and  268  K. 

3.  Vitrification  (devitrification )  of  the  hydrate  shell  of  the  major  groove  of  DNA 
takes  place  in  the  range  from  1 80  to  2 1 5  K.  Cryoprotectors — DMSO  and  PD — 
added  to  the  aqueous  DNA  solution  suppress  the  vitrification  temperature  of  DNA 
hydrate  shell. 

4.  There  is  an  effective  energy  transfer  to  the  probe  in  the  6SG  containing  DNA 
after  UV  radiation. 
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Abstract 

Chemical  and  biological  damage,  caused  by  directly  or  indirectly  ionizing  radiations,  is  attributable 
to  the  action  of  the  charged  particle  tracks  in  the  absorbing  medium.  Attempts  to  elucidate  the  biophysical 
mechanisms  involved,  and  to  quantify  the  damage,  are  typically  made  in  terms  of  one  or  more  of  the 
main  physical  parameters  descriptive  of  the  charged  particle  tracks.  To  meet  a  need  for  a  ready  reference 
source  of  such  information,  tables  of  the  relevant  parameters  have  been  calculated  for  a  liquid  water 
medium.  The  full  tables  are  obtainable  elsewhere.  Here,  a  description  is  given  of  the  quantities  calculated 
and  an  extended  example  is  given  of  their  application  in  elucidating  the  physical  mechanisms  of  radiation- 
induced  biological  damage.  A  representative  selection  of  data  is  displayed  graphically  to  illustrate  the 
extent  of  the  information  obtained  and  its  value  in,  e.g.,  application  to  fundamental  radiation  dosimetry. 
Track  structure  data  is  tabulated  for  instantaneous  energies  of  individual  particles  and  for  the  fluence 
and  dose-weighted  spectra  at  charged  particle  equilibrium.  Data  are  listed  for  incident  electrons  (50  eV 
to  30  MeV);  characteristic  Ka  X-rays  from  carbon  to  uranium;  commonly  used  radioisotope  sources  of 
241  Am,  137Cs,  and  60Co  and  for  continuous  X-ray  spectra  (<300  kV);  Auger  electron  and  beta-emitter 
radionuclides;  heavy  charged  particles  having  specific  energies  of  0.5  keV/V  to  1  GeV/^  for  74  ion  types 
ranging  from  protons  to  uranium  ions,  and  for  monoenergetic  neutrons  (0.5  keV  to  100  MeV).  Quantities 
listed  are  kerma  factors;  fluence  of  charged  particles  per  unit  source  concentration;  buildup  factors;  track 
and  dose-average  LET  and  restricted  LET;  W  values;  z2//?2;  /32;  delta-ray  yields,  energies,  and  ranges;  ion 
ranges;  and  the  mean  free  path  for  primary  ionization  and  the  linear  primary  ionization.  For  indirectly 
ionizing  radiations,  the  microdose  quantities,  frequency,  and  dose  means  of  lineal  energy  are  tabulated 
along  with  typical  energy  deposition  distribution  spectra  for  neutrons  and  gamma  rays  in  micron  and 
nanometer  volumes.  ©  1 994  John  Wiley  &  Sons,  Inc. 


1.  Introduction 

There  is  a  continuing  need  for  a  comprehensive  set  of  reference  tables  containing 
parameters  descriptive  of  properties  of  the  charged  particle  tracks  generated  directly 
or  indirectly  by  ionizing  radiations  in  biological  materials.  Applications  are  in  diverse 
fields  such  as  interpretation  of  biological  damage  mechanisms  for  radiological  pro¬ 
tection;  modeling  of  radiation  effects;  heavy  charged  particle  therapy;  radiation 
dosimetry  for  astronauts  and  for  computer  components  in  space  vehicles;  assessment 
of  effects  of  incorporated  radionuclides  in  nuclear  medicine;  design  of  instrumental 
response  for  quality  and  dose  specification;  study  and  interpretation  of  inhomo¬ 
geneous  reaction-rate  kinetics  of  induced  radicals  in  aqueous  media;  and  study  of 
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damage  induced  in  organic  macro-molecules.  Here,  there  is  not  space  to  reproduce 
the  extensive  sets  of  tables  [1-3]  but  a  brief  description  is  given  of  the  main  param¬ 
eters  tabulated  and  of  their  physical  meaning.  Examples  of  the  results  for  some  of 
the  quantities  are  displayed  graphically.  The  results,  calculated  for  a  liquid  water 
medium  (representative  of  biological  material),  allow  indirectly  for  physical  phase 
and  chemical-bonding  effects. 

A  practical  example  is  given  of  the  use  of  the  parameters  in  identifying  funda¬ 
mental  mechanisms  involved  in  the  biological  effectiveness  of  heavy  charged  par¬ 
ticles.  The  tables  entitled  “Track  structure  data  for  ionizing  radiations  in  liquid 
water”  are  in  three  main  parts:  Part  1  is  on  electrons  and  photons  (50  eV  to  30 
MeV);  Part  2  is  on  74  types  of  heavy  charged  particles  (100  eV/ji  to  1  GeV/^) 
ranging  from  protons  to  uranium  ions;  and  Part  3  is  on  neutrons  (0.5  keV  to  100 
MeV).  The  complete  tables  are  available  from  the  authors  on  request  [1-3]. 

2.  Details  of  the  Calculated  Track  Structure  Parameters 
and  Their  Physical  Interpretation 

2.1.  Part  1(a):  Electrons  at  Energies  50  eV  to  30  MeV 

Parameters  calculated  in  the  continuous  slowing  down  approximation  (CSDA) 
are  the  collisional  linear  energy  transfer,  LET  (Too);  the  track  and  dose-average 
restricted  let  (Tioo);  the  linear  primary  ionization  (/)  and  the  mean  free  path  (X) 
between  primary  ionizations;  the  CSDA  range  (7?);  and  the  kerma  factor  ( Kf ).  Results 
are  given  for  electrons  at  instantaneous  energies  (e.g.,  for  track  segment  experiments) 
and  for  electrons  in  charged  particle  equilibrium.  The  CSDA  range  of  primary  elec¬ 
trons  has  a  dual  meaning  as  it  is  also  equivalent  to  the  primary  electron  fluence 
per  unit  source  concentration  of  electrons.  Also  given  are  the  mean  energies  and 
ranges  of  the  generation  of  secondary  electrons  in  the  equilibrium  spectrum  in 
addition  to  the  parameters  analogous  to  those  given  for  the  primary  electrons  and 
the  relative  variances  of  the  dose  and  track  lets.  The  relative  variance  quantifies 
the  width  of  the  distribution. 

Parameters  listed  that  could  have  special  significance  in  radiation  chemistry  are 
the  space  density  of  primary  electrons  in  the  initial  track;  the  total  space  density  of 
all  electrons  per  unit  source  concentration  of  initial  electrons;  and  the  total  electron 
fluence  and  factors  for  the  buildup  of  electron  concentration  and  of  electron  fluence. 
Approximate  values  are  listed  for  the  mean  energy  ( W)  expended  in  producing  a 
primary  ionization  along  primary  electron  tracks.  Figure  1  shows  the  CSDA  ranges 
for  electrons  in  liquid  water. 

2.2  Part  1(b):  Photons 

Data,  similar  to  those  listed  above  for  electrons,  are  tabulated  for 

(i)  Characteristic  (fluorescent)  X-ray  energies  for  27  elements  ranging  from  carbon 
to  uranium.  Figure  2  shows  the  relevant  track-average  let,  the  dose-average  let, 
and  the  mean  free  path,  X,  for  primary  ionization,  for  the  equilibrium  electron 
spectra  generated  by  fluorescent  Ka  X-rays. 


,  keV/nm  ;  L  ,  ,  keV/jim ; 
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(ii)  Orthovoltage  continuous  (medical)  X-ray  energy  spectra  for  nine  applied 
potentials  ranging  from  50  to  300  kV.  Frequency-weighted  and  kerma-weighted 
mean  photon  energies,  the  mass  energy  transfer  coefficients,  the  electron  production 
cross  sections,  and  the  mean  energies  of  photoelectrons  and  of  Compton  electrons 
are  listed.  The  mass  energy  transfer  coefficient  combined  with  the  initial  photon 
energy,  and  the  electron  production  cross  sections  combined  with  the  electron  spec¬ 
trum,  enable  the  validity  of  the  calculation  to  be  tested,  as  each  should  yield  the 
same  kerma  factor  (conservation  of  energy).  This  is  proved  to  within  5%. 

(iii)  Radioisotope  sources.  Results  are  presented  in  the  tables  for  Am-241  (59.6 
keV  y),  Cs-137  (661  keV  7),  and  Co-60  (1.17  and  1.33  MeV  y’s). 

(iv)  Beta,  Auger  electron,  and  X-ray  emitting  radionuclides,  viz.,  1-125,  1-131, 
H-3,  C-14,  P-32,  P-33,  and  Br-77.  These  radionuclides  have  wide  application  as 
tracers,  as  incorporated  radionuclides,  and  in  labeling  of  organic  macromolecules. 
There  is  evidence  that  Auger-electron  emitters  are  especially  damaging. 

The  results  are  accurate  to  within  5%  at  electron  energies  above  10  keV  but  the 
accuracy  progressively  decreases  to  ~  10%  at  1  keV  energy  and  may  be  as  large  as 
25%  at  100  eV. 

2.3.  Part  2:  Heavy  Charged  Particles/Accelerated  Ions 

Parameters  for  heavy  charged  particles,  having  specific  energies  between  1  keV 
per  unit  mass  number  and  1  GeV  per  unit  mass  number,  have  been  calculated  for 
74  ion  types  ranging  from  protons  to  uranium  ions.  The  ion  types  selected  are 
those  typically  used  in  biological  and  chemical  experiments  with  accelerated  ions. 
Also  included  are  light  ions  (lithium  to  oxygen  isotopes)  produced  as  typical  frag¬ 
mentation  products  in  targets  of  low  atomic  number  when  bombarded  by  relativistic 
protons.  The  data  have  important  application  in  space  dosimetry  and  in  high  let 
therapy.  The  calculations  are  based  on  Berger’s  recently  revised  results  for  stopping 
of  protons  and  alpha  particles  (ICRU  Report  No.  49)  [4]  in  liquid  water.  For  fast 
ions,  the  track  structure  quantities  calculated  and  tabulated  as  a  function  of  specific 
energy  have  their  origin  in  the  Bohr/Bethe  theories  of  stopping  power  (e.g.,  ICRU- 
37)  [5],  viz.: 

1  z1 

“•£00  oc  -2  *F(ft)  . 
p  Pt 

1 8f,  the  dimensionless  ion  velocity  in  atomic  units,  determines  the  maximum  spatial 
distribution  of  delta  rays  around  the  ion  track,  z 2/(3j,  the  ratio  of  the  effective  charge 
to  velocity,  can  be  interpreted  either  as  the  linear  yield  of  primary  ionizations  and / 
or  the  linear  yield  of  delta  rays  along  the  track.  /,  the  linear  primary  ionization  is 
the  zeroth  moment  of  energy  transfer.  Its  reciprocal  is  the  mean  free  path,  X,  for 
primary  ionization.  L ,  the  collisional  linear  stopping  power  and  the  first  moment 
of  energy  transfer,  includes  the  kinetic  energy  of  the  radially  distributed  delta  rays. 
The  delta-ray  properties  depend  only  on  the  specific  energy  of  the  ions  and  are 
independent  of  ion  type.  A  separate  table  on  delta  rays  contains  average  and  max- 
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imum  delta-ray  energies  and  their  corresponding  CSDA  ranges.  These  are  illustrated 
in  Figure  3.  Ion  ranges  are  shown  in  Figure  4.  W  values  for  primary  ionization  are 
tabulated.  Analysis  of  the  radiation  effect  for  a  specified  end-point  and  ion  type,  in 
terms  of  the  three  quantities  0f,  z 2/(3j,  and  L „ ,  should  enable  the  components  of 
the  total  damage  to  be  linked  to  the  causal  physical  action,  i.e.,  respectively,  the 
effects  attributable  to  the  spatial  spread  and  number  of  delta  rays,  the  spacing  of 
events,  and  the  components  of  energy  transfer.  Restricted  let  with  100  eV  delta- 
ray  energy  cutoff  is  also  of  interest  in  this  context  as  it  provides  a  measure  of  the 
energy  deposition  along  a  cylindrical  “core”  of  about  4  nm  radius  along  the  ion 
track.  Examples  of  the  trends  of  the  parameters  are  given  in  Figures  5-7.  The 
remaining  energy  loss  is  in  the  delta-ray  penumbra.  In  radiation  dosimetry,  similar 
quantities  are  required — but  averaged  over  the  charged  particle  equilibrium  spec¬ 
trum.  Thus,  the  tables  also  contain  the  quantities  track-averaged  and  dose-averaged 
lets  and  their  restricted  versions.  The  data  tabulated  for  instantaneous  specific 
energies  of  ion  are  applicable  to  track  segment  experiments.  Values  for  the  equi¬ 
librium  spectrum  of  ions  are  applicable  to  irradiations  where  the  whole  track  stops 
in  the  biological  material  or  where  charged  particle  equilibrium  may  be  achieved 
as,  e.g.,  in  fast  neutron  irradiations. 

2.4.  Part  3:  Neutrons  (0.5  keV  to  100  MeV) 

In  calculating  the  neutron  data,  only  elastic  collisions  are  considered.  No  allow¬ 
ance  is  made  for  (n,  ce)  reactions  in  oxygen  (neutron  threshold  energy  ~  6.3  MeV) 


E/A,  keV/amu 

Figure  4.  Ranges  of  accelerated  ions  in  liquid  water.  The  range  of  an  ion  must  be  «  20 
Jim  to  ensure  track  traversal  of  a  mammalian  cell.  The  vertical  arrows  along  the  abscissa 
indicate  specific  energies  corresponding  to  an  ionization  mean  free  path  of  2  nm — at  which 

rbes  will  be  maximum. 


MEAN  FREE  PATH  FOR  IONISATION,  X  nm 
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E/A.  keV/u 


Figure  5.  Mean  free  path  for  ionization  is  shown  as  a  function  of  specific  energy  for 
the  ion  types  indicated.  For  the  study  of  damage  mechanisms  in  biophysics,  only  light- 
accelerated  ions  (A  <  20)  are  appropriate;  otherwise,  basic  mechanisms  of  interest  will  be 
obscured  in  the  large  background  effects  due  to  saturation  damage. 


E/A.,  keV/amu 

Figure  6.  let  and  restricted  let,  Li00,  for  accelerated  ions  in  liquid  water. 
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or  for  multiparticle  breakup  reactions  (neutron  threshold  energy  ~  20  MeV).  For 
radiological  protection,  the  neutron  energies  most  commonly  experienced  are  in 
the  keY  to  1 5  MeV  energy  range.  Biophysical  data  calculated  for  both  the  H  and 
O  recoil  component  produced  by  neutron  interaction  in  water  are  the  recoil  source 
density,  the  partial  kerma  factors,  and  the  equilibrium  fluence  of  recoils  of  H  and 
O  per  unit  incident  neutron  fluence.  Track  structure  quantities  listed  are  those 
indicated  for  the  other  radiations  described  above  plus  maximum  and  average  recoil 
energies,  their  projected  ranges,  the  maximum  and  average  delta-ray  energies,  and 
ranges  associated  with  the  recoil  tracks. 

In  recent  years,  the  subject  of  microdosimetry,  which  takes  into  account  the 
stochastic  nature  of  energy  deposition,  has  been  applied  in  attempts  to  interpret 
radiation  effects  and  mechanisms  in  small  sites  [39].  Site  sizes  were  typically  of 
micron  dimensions  to  represent  the  dimensions  of  biological  cell  nuclei  and  of 
inhomogeneous  rate  processes  in  biochemistry.  More  recently,  the  possible  impor¬ 
tance  of  nanometer  sites  in  macromolecular  targets  has  been  recognized,  e.g.,  in 
the  DNA.  Relevant  information  on  the  frequency  and  dose  distributions  and  on 
the  corresponding  mean  lineal  energies  along  with  the  relative  variances  is  provided. 
Figure  8  shows  the  track  and  dose-weighted  mean  lineal  energies  in  water  for  the 
H  recoil  component  generated  by  monoenergetic  neutrons  in  a  1  jim  site  diameter. 

3.  Application  of  Track  Structure  Parameters  in  the  Interpretation 
of  Damage  to  Mammalian  Cells 

Damage  by  ionizing  irradiations  of  mammalian  cells  is  commonly  expressed  in 
terms  of  the  surviving  fraction  as  a  function  of  absorbed  dose  for  a  specified  biological 


1  10  100  1000  10000 

NEUTRON  ENERGY,  keV 

Figure  8.  Frequency  and  dose-weighted  lineal  energies,  yFkeV/Mm  and  yD  keV/jum,  are 
shown  for  monoenergetic  neutrons  in  liquid  water. 
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Figure  9.  Interpretation  of  damage  mechanisms  for  inactivation  of  mammalian  cells,  (a) 
is  shown  to  be  more  meaningful  and  the  damage  coefficients  better  correlated  as  a  func¬ 
tion  of  ionization  mean  free  path,  X  nm,  than  (b)  as  a  function  of  L* .  Data  are  extracted 

from  [6-38]. 
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end-point.  The  information  obtained  is  of  value  for  interpretation  of  damage  mech¬ 
anisms  implicit  in  the  development  of  models  of  radiation  action  used  in  radiological 
protection.  Good  models  are  required  to  permit  reliable  extrapolation  of  effects 
from  the  high  dose  levels  used  in  laboratories  to  the  lower  doses  and  dose  rates 
occurring  near  environmental  levels  and  experienced  by  the  majority  of  the  pop¬ 
ulation.  Parameters  can  be  identified  that  are  of  value  in  quantifying,  e.g.,  the  risk 
of  radiation-induced  cancer.  Figure  9(a)  shows  a  compilation  of  results,  extracted 
from  the  literature,  for  the  initial  slope  “a”  in  gray-1  as  a  function  of  mean  let, 
L  keV/fim,  for  mammalian  cell  survival  after  irradiation  with  a  wide  selection  of 
heavy  ion  types.  This  graph  can  be  considered  as  a  representation  of  the  currently 
accepted  system  of  dosimetry  used  to  determine  exposure  limits  in  safety  legislation. 
Close  examination  of  the  plotted  curve  shows  that  the  saturation  value  of  a  occurs 
at  different  values  of  L,  depending  on  charged  particle  type,  and,  consequently, 
there  is  no  possibility  of  obtaining  a  unified  curve  with  these  parameters.  The  cur¬ 
rently  accepted  system  of  dosimetry  is  therefore  fundamentally  unsound.  Since  the 
cross  section  for  induction  of  the  specified  biological  end-point  represents  the  prob¬ 
ability  of  the  effect  per  unit  fluence  of  tracks,  a  more  appropriate  procedure  is  to 
compare  the  effect  cross  section  with  the  track  parameters  of  interest.  If  a  “unified” 
plot  can  be  obtained  for  a  wide  range  of  radiation  types,  then  the  relevant  parameter 
must  surely  be  of  significance.  The  effect  cross  section,  aeff,  is  related  to  the  initial 
slope  of  the  dose-survival  curve  by 

.  2  1 .6  X  10-9-Ir(keV/Mm) 

0>#(cm  -  ^g/cn^, .  D(Gy) 

where  LT  is  the  track  average  LET;  D,  the  absorbed  dose,  and  p,  the  density.  For 
the  analysis,  the  initial  slope  is  selected  to  avoid  any  complications  due  to  subsequent 
recovery  of  damage  that  may  be  associated  with  cellular  repair  processes.  The  data 
used  in  Figure  9(a)  are  transposed  in  Figure  9(b)  to  a  <jejr  -  \  plot,  using  the 
equation  above.  Here,  A  is  the  mean  free  path  for  primary  ionization  along  the 
charged  particle  tracks.  Several  interesting  factors  are  revealed.  There  is  a  significantly 
better  correlation  of  results  onto  a  unified  curve.  There  is  a  common  point  of 
inflexion,  for  all  charged  particle  types,  at  a  mean  free  path  of  2  nm,  which,  on 
various  arguments,  leads  to  the  conclusion  that  the  key  lesion  is  damage  to  the 
DNA.  The  spacing  of  events,  rather  than  energy  deposition,  is  the  mechanism 
involved.  As  the  <r-A  plot  is  better  correlated  than  is  the  a-L  plot,  the  conclusion 
is  that  delta  rays  (the  energy  of  which  are  included  in  the  let)  must  have  much 
reduced,  probably  negligible,  effect  except  in  the  highly  saturated  region.  This  finding 
challenges  the  physical  basis  of  the  system  of  “dose-limitation”  used  in  radiation 
protection  and  the  assumption  that  energy  deposition  is  the  important  quantity  in 
determining  damage.  Numbers  and  the  correlated  spacing  of  pairs  of  interactions, 
whether  direct  or  indirect,  are  found  to  be  the  key  factors  that  are  common  to  all 
radiation  types. 
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Abstract 

The  molecular  transform  index  ( FTm )  is  a  unitary  representation  of  a  molecule  based  on  a  Fourier 
operation  on  the  bond  distance  or  graphical  descriptor  matrices  of  the  structure  while  incorporating  the 
atomic  number  of  the  constituent  atoms.  In  a  series  consisting  of  phosphonates,  phosphonothionates, 
and  a  phosphinate,  the  FTm  gave  an  excellent  linear  correlation  (R  =  0.91 )  with  experimentally  determined 
octanol/water  partition  coefficients  (log  P0/w).  In  a  second  group  containing  phosphonofluoridates, 
thionophosphonofluoridates,  and  phosphoramidofluoridates,  the  FTm  correlation  with  log  P0/yv,  calculated 
by  the  ^-fragment  method,  served  to  separate  the  fluoridates  and  amidates  as  structural  subclasses. 
©  1 994  John  Wiley  &  Sons,  Inc.* 


Introduction 

The  molecular  transform  index  (FTm)  is  a  unitary  numerical  representation  of 
a  molecular  structure.  It  is  determined  in  four  steps  by  performing  a  Fourier  op¬ 
eration  on  any  one  of  several  possible  structure  representations  to  give  a  curve 
analogous  to  that  of  free  induction  decay,  squaring  the  points  on  the  curve  to  make 
all  the  amplitudes  positive,  integrating  the  area  under  the  curve,  and  then  taking 
the  square  root  of  the  area.  The  structure  may  be  represented  by  a  matrix  of  Cartesian 
coordinates  or  virtually  any  graphical  depictions  such  as  adjacency  or  distance 
matrices,  and  the  latter  may  be  either  distances  between  bonded  atoms  only  (two- 
dimensional)  or  between  all  atoms  (three-dimensional).  In  essence  then,  the  gen¬ 
eration  of  the  index  is  a  mapping-down  process  from  a  two-  or  three-dimensional 
to  a  one-dimensional  space.  However,  the  structure  is  even  more  precisely  defined 
by  including  the  atomic  number  of  the  constituent  atoms  in  the  original  Fourier 
operation  [  1  ] .  In  several  studies,  the  index  has  been  used  to  correlate  both  physical 
and  pharmacological  properties  in  a  series  of  highly  varied  structures  [1];  two  ther¬ 
modynamic  functions  in  a  series  of  linear  and  branched  hydrocarbons  [2] ;  bi  variant 
enzyme  inhibition  activity  in  a  large  series  of  organophosphorus  compounds  [  3  ] ; 
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and,  in  an  alternate  form,  as  a  descriptor  for  molecular  similarity  analyses  [4] .  The 
FTm  compares  very  favorably  with  descriptors  derived  from  quantum  chemical 
(mndo)  calculations  [  5  ] . 

The  successful  use  of  the  FTm  for  the  correlation  of  a  variety  of  indices  led 
naturally  to  considering  its  value  as  a  correlator,  or  even  predictor,  of  the  “extra- 
thermodynamic”  molecular  parameter,  the  partition  coefficient.  This  parameter, 
reflecting  hydrophobicity  [6],  is  highly  indicative  of  membrane  as  well  as  general 
transport  potential  in  biological  systems  and  in  its  functional  form  is  expressed  as 
the  logarithm  io  (log  P)  of  the  octanol/ water  partition  coefficient  as  measured  under 
rather  standard  conditions  [7-9] .  Hansch  and  Leo  developed  an  empirical  paradigm 
for  its  calculation  known  as  the  “7r-fragment”  method  [  10  ] .  While  there  have  been 
variations  on  this  methodology,  perhaps  the  most  promising  from  a  basic  theoretical 
standpoint  is  that  of  Politzer  and  Murray,  in  which  the  calculation  is  based  on 
molecular  surface  area  and  electrostatic  potentials  [11-13],  However,  inasmuch  as 
the  FTm  is  also  structure-based,  and  rather  less  complex  to  calculate,  its  value  as  a 
correlator  for  log  P  was  considered  to  be  appropriate  for  investigation.  For  this 
purpose  a  series  of  organophosphorus  compounds  was  selected  whose  partition 
coefficients  had  been  measured  by  High  Performance  Liquid  Chromatographic 
methodology  (Table  I)  [14] .  For  an  evaluation  of  FTm  correlation  capability  versus 
partition  coefficients  calculated  by  the  7r-fragment  method,  the  compounds  shown 
in  Table  II  were  chosen. 


Methodology  and  Results 

The  log  P  values  shown  in  Table  II  were  calculated  by  the  7r-fragment  methodology 
as  described  for  examples  cited  by  Leo,  Hansch,  and  Elkins  [15].  The  fragment 


Table  I.  HPLC-measured  partition  coefficients  (log  P)  and  FTm  values. 


Compound  No. 

Structure 

Log  I* 

FT„b(X102) 

1 

CH3P(0)(H)0C2H5 

-0.60 

2.021 

2 

CH3P(0)(0CH3)2 

-0.61 

2.535 

3 

C2H5P(0)(0C2H5)2 

0.66 

3.208 

4 

CH3P(OXOC2H5)(SCjH5) 

0.71 

3.226 

5 

CH3P(0)[0CH(CH3)2]2 

1.03 

3.435 

6 

CH3P(S)(OC2H5)2 

2.08 

3.120 

7 

CH3P(0)(0C4H9)2 

3.26 

3.891 

8 

CH3P(S)(OC3H7)2 

3.26 

3.074 

9 

CH3P(0)(0C5H„)2 

3.74 

4.350 

10 

CH3P(0)[0CH(CH3)C(CH3)3]2 

4.29 

4.811 

11 

CH3P(0)(0C8H17)2 

6.13 

5.735 

a  Taken  from  Ref.  [14]. 
b  Molecular  transform  index. 
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Table  II.  Calculated  partition  coefficients  (Clog  P)  and  FTm  values. 


Compound  No. 

Structure 

CLog  Pa 

FTmb  (X102) 

1 

CH3P(0)(F)0-/so-propyl 

0.54 

2.801 

2 

CH3P(0)(F)0CH(CH3)C(CH3)3 

1.84 

3.488 

3 

CH3P(0)(F)0-cyclohexyl 

2.04 

3.596 

4 

CH3P(0)(F)0-(2-methylcyclohexyl) 

2.34 

3.826 

5 

CH3P(S)(F)OCH(CH3)C(CH3)3 

2.30 

3.442 

6 

CH3P(S)(F)O-(2-methylcycl0hexyl) 

2.50 

3.908 

7 

CH3P(S)(F)0-cyclopentyl 

2.00 

3.306 

8 

CH3P(0)(F)0(CH2)2N(7ra-propyl)2 

1.54 

4.269 

9 

CH3P(0)(OCH2CH3)SCH2CH2N(CH(CH3)2)3 

1.57 

4.830 

10 

(CH30)2P(0)-(  1  -morpholinyl) 

-2.05 

4.006 

11 

(CH3)2NP(0)(F)0(CH2)2N(CH3)2 

-0.86 

3.970 

12 

(CH3)2NP(0)(F)0(CH2)2N(/^-propyl)2 

0.64 

4.876 

13 

(CH3)2NP(0)(F)0(CH2)3N(CH3)2 

-0.36 

4.157 

14 

(CH3)2NP(0)(F)S(CH2)2N(wo-propyl)2 

2.02 

5.017 

15 

(CH3)2NP(0)(F)(0(CH2)2-(  1  -pyrrolidinyl)) 

-0.56 

4.484 

16 

(CH3)2NP(0)(F)(0-3-(  1  -methyl-pyrrolidinyl)) 

-1.05 

4.293 

17 

(CH3)2NP(0)(F)(0-(3-quinuclidinyl)) 

-0.05 

4.888 

a  Calculated  according  to  Reference  [10]. 
b  Molecular  transform  index. 


values  for  the  phosphoryl  and  analogous  thiono  entity  were  determined  from  the 
reported  measured  partition  coefficients  of  paraoxon  and  parathion  [10],  respec¬ 
tively,  by  “backing  out,”  i.e.,  subtracting,  the  appropriate  7r-fragments;  it  was  then 
possible  to  “add  in”  the  7r-fragments  necessary  to  construct  the  compound  of  interest 
and  thereby  arrive  at  its  log  P  value. 

The  FTm  values  were  calculated  from  matrices  of  bond  distances  and  atomic 
numbers  of  constituent  atoms  of  the  molecules  as  previously  reported  [1-4];  in 
these  citations  the  FTm  was  denoted  as  SQRT.  The  bond  distances  were  taken  from 
Gordon  and  Pople  [16],  March  [17],  and  Corbridge  [18].  The  correlation  trials 
were  performed  on  a  Texas  Instruments  TI-59  Programmable  Calculator  using  a 
regression  program  described  by  Clark  [19]. 

Correlation  Analyses 

Linear  regression  of  the  log  P  and  FTm  values  for  the  compounds  shown  in  Table 
I  gave  the  following: 


log  P  —  0.0 186  FTm  -  4.472  (1) 

72=11  7?  =  0.911  S  =  0.931  F=  44.176 

The  compounds  in  Table  II  were  considered  in  two  groups  as  shown  below: 
Compounds  1-7: 
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log  P  =  0.0163 FTm  -  3.722  (2) 

n  =  l  R  =  0.909  S  =  0.299  F=  23.839 
Compounds  11-16: 

log  P  =  0.0247 FTm  -  1 1.068  (3) 

n  =  6  =  0.872  S  =  0.636  F=  12.745 


where: 

n  =  number  of  compounds, 

R  =  correlation  coefficient, 

5  =  standard  deviation, 

F=  F  statistic. 

For  the  compounds  in  the  first  group  in  Table  II  ( nos.  1  -7 ) ,  inclusion  of  compound 

8  in  the  regression  dropped  the  correlation  coefficient  (R)  to  0.543.  Inclusion  of 
compound  no.  17  in  the  second  group  resulted  in  an  R  value  of 0.800.  Compounds 

9  and  10  did  not  fit  into  either  correlation  grouping. 

Discussion 

The  success  of  the  molecular  transform  index  as  a  unitary  structure  descriptor 
for  the  analysis  and  prediction  of  structural  features,  physicochemical  and  ther¬ 
modynamic  properties,  and  pharmacological  activity,  as  shown  in  previous  studies, 
serves  to  underline  its  viability  in  the  present  instance.  Nowhere  is  this  more  evident 
than  in  the  correlation  for  the  compounds  in  Table  I.  In  this  case,  the  partition 
coefficients  had  been  measured  with  a  high  degree  of  precision  and  this  is  reflected 
in  the  relatively  high  correlation  coefficient  even  though  the  series  spans  several 
orders  of  magnitude  of  the  partition  coefficient  itself.  Contrarily,  the  correlations 
for  the  compounds  shown  in  Table  II,  while  acceptable  for  preliminary  estimations, 
may  be  indicative  of  the  empiricity  or  limitations  of  the  particular  partition  coef¬ 
ficient  calculation  method.  In  respect  to  log  P  estimation  it  is  known  that,  for  polar 
molecules  such  as  the  organophosphorus  compounds,  parameter  estimation  is  not 
well  established  and  in  this  case  was  based  upon  only  two  actual  measurements. 
From  a  structure  correlation  viewpoint  the  FTm  very  nicely  separated  the  fluoridates 
and  the  phosphoramides  and,  indeed,  a  rough  plot  of  log  P  versus  the  FTm  values 
shows  this  separation  as  two  approximately  parallel  classes. 

The  log  P  estimation  problem  for  polar  compounds  is  quite  evident  when  com¬ 
pound  8  is  included  with  compounds  1-7  of  Table  II  as  the  correlation  declines 
precipitiously.  This  degradation  is  less  drastic  when  compound  1 7  is  included  with 
numbers  1 1  through  16.  However,  accounting  for  the  lack  of  fit  of  compounds  9 
and  10  in  either  regression  is  difficult.  One  may  argue  that  10  is  a  phosphate,  the 
lone  member  of  a  class  not  elsewhere  considered  in  this  study.  From  the  general 
impetus  of  the  data  herein,  it  would  appear  likely  that  the  log  P  values  for  compound 
9,  and  perhaps  8  also,  were  underestimated.  In  any  case,  further  studies  with  these 
and  other  classes  of  compounds  is  warranted. 
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Conclusions 

The  molecular  transform  index  ( FTm )  has  been  shown  to  be  a  useful  unitary 
structure  descriptor  for  correlation  of  the  octanol/ water  partition  coefficient  and 
structure  in  series  of  organophosphorus  compounds.  This  was  most  evident  where 
the  log  P  values  for  a  series  containing  phosphonates,  phosphonothionates,  and  a 
phosphinate  were  experimentally  measured  with  precision.  The  FTm  also  served 
to  functionally  discriminate  two  series  for  which  the  log  P  was  calculated  by  the 
Leo/Hansch  7r-fragment  methodology.  One  contained  phosphonofluoridates  and 
thionophosphonofluoridates,  and  the  other  predominantly  phosphoramides. 
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Abstract 

We  consider  the  paradoxical  situation  arising  in  the  standard  multiple  regression  analysis  in  that  as 
the  standard  error  of  prediction  decreases  by  introduction  of  additional  variables  (descriptors)  at  the 
same  time  the  standard  error  of  the  coefficients  of  the  regression  analysis  increases,  often  to  the  point  of 
the  coefficients  having  no  statistical  validity.  We  trace  the  origin  of  this  paradoxical  situation  to  inter¬ 
correlation  of  the  variables.  A  remedy  to  this  curve-fitting  paradox  is  in  the  introduction  of  orthogonal 
variables  or  descriptors.  ©  1994  John  Wiley  &  Sons,  Inc. 


Introduction 

An  apparent  contradiction  observed  in  the  rigorous  sciences  is  referred  to  as  a 
paradox.  A  paradox  may  be  a  consequence  of  a  lack  of  adequate  conceptual  basis, 
as  is  illustrated  by  the  paradox  of  Zeno,  one  of  the  best-known  paradoxes  from 
antiquity.  Zeno  posed  it  as  the  problem  of  a  race  between  the  old  Greek  mythological 
athlete  Achilles  and  the  tortoise.  Old  Greeks  had  no  notion  of  the  limit  to  recognize 
that  while  there  is  no  end  to  the  sequence  of  hypothetical  steps  of  Achilles  to  reach 
the  current  position  of  the  tortoise,  which  was  given  an  initial  advantage,  the  con¬ 
structed  sequence  has  a  limit. 

Often,  the  term  paradox  is  used  in  a  less  rigorous  sense.  Thus,  the  so-called 
Creten  Paradox  illustrates  an  apparent  but  not  a  genuine  paradox.  Philosopher 
Epimenides  is  credited  to  have  said:  “All  Cretans  are  liars.”  Was  he,  being  a  Cretan, 
lying  or  speaking  truth?  The  statement  is  not  a  true  paradox  since  it  may  represent 
a  false  statement  [1]. 

The  apparent  and  genuine  contradictions  often  have  their  origins  in  the  meta- 
mathematical  context  of  the  statements  that  refer  to  themselves.  In  a  way,  the 
paradox  that  we  will  discuss  in  this  article  has  elements  of  self-reference.  We  will 
focus  on  an  apparent  paradox  of  numerical  content,  involving  statistical  analysis. 
Consider  curve  fitting  either  by  a  power  series  or  a  closely  related  multiple  linear 
regression  analysis.  In  such  problems,  one  considers  a  set  of  powers  l,/(x),/2(x), 
/3(x), . . .  fk(x ),  the  simplest  cases  being/(xr)  =  xorf(x)  =  i/x,  to  represent  a 
curve,  or  a  set  of  descriptors  d{ ,  d2,  d3, . . .  dk  to  represent  a  plane  in  /c-dimensional 
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space.  We  can  consider  different  powers  xk  or  fk{x)  as  different  descriptors  dk  to 
represent  a  property  P  of  a  molecule 

P(Gn)  =  c{di  +  c2d2  +  c2d$  +  . . .  4-  ckdk ,  ( 1 ) 

which  reduces  the  power  expansion  to  multiple  regression  analysis.  Here,  Gn  is  the 
n-  th  object  (molecule,  graph)  in  a  set  of  which  data  are  known  and  to  be  fitted  by 
dk  descriptors.  The  coefficients  ck  are  determined  by  the  least-square  procedure. 

The  precision  of  the  fitting  of  the  data  is  measured  by  the  standard  error  S  and 
the  associated  coefficient  of  regression  R  or  its  square  R 2  (the  coefficient  of  the 
determination).  Statistically  significant  descriptors  tend  to  decrease  the  standard 
error  and  increase  the  coefficient  of  the  regression  R .  In  the  ideal  situation,  the 
limiting  value  for  S  is  zero,  while  R  approaches  one. 

All  this  is  well  known.  However,  an  analysis  of  the  statistical  parameters  of  the 
regression  equation,  i.e.,  an  analysis  the  coefficients  ck  appearing  in  a  regression 
equation,  often  was  overlooked.  When  undertaken,  it  was  observed  with  great  dis¬ 
appointment  that  an  increase  in  the  number  of  descriptors  ( or  the  number  of  power 
terms,  in  the  case  of  power  series  expansion )  worsen  the  standard  errors  for  the 
coefficients  of  a  regression  equation  dramatically.  That  means  that  the  equations 
are  becoming  less  reliable  as  more  descriptors  were  used.  By  the  time  one  reaches 
a  satisfactory  standard  error  S'  for  the  property  P ,  the  standard  errors  for  the  cor¬ 
responding  coefficients  in  the  equation  of  regression  show  that  the  coefficients  are 
statistically  meaningless.  So,  we  have  a  paradoxical  situation:  To  obtain  reliable 
predictions  for  properties,  the  analysis  rests  on  statistically  “unreliable”  equations. 

Illustration 

Let  us  illustrate  the  paradox  on  the  regression  equation  derived  for  the  1 8  isomers 
of  octane  for  their  heats  of  formation  (///): 

Hf=  1.41  -  IO.O8X1  -  4.86X2  ~  4.50X3 

-  0.04x4  -  L34X5  +  2.0 lx6  +  5.83x7.  (2) 

Here,  the  descriptor  Xj  is  the  connectivity  index  [2]  and  the  Xk  are  the  higher 
connectivity  indices  [  3  ] .  The  connectivity  indices  are  mathematical  invariants  that 
can  be  evaluated  once  the  molecular  structural  formula  is  known.  The  connectivity 
index  X{  is  given  as  a  sum  of  weighted  CC  bond  contributions.  The  weighing  takes 
into  account  the  presence  of  the  neighboring  carbon  atoms  and  is  defined  by  the 
following  numerical  values  [2]: 


CC  bond  weights 

Numerically 

Primary-secondary 

l/VTI 

0.70711 

Primary- tertiary 

1/vn 

0.57735 

Primary-quaternary 

1/VF4 

0.50000 

Secondary-secondary 

1/V2V2 

0.50000 
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Secondary-tertiary 

1/V20 

0.40825 

Secondary-quaternary 

l/V2^4 

0.35355 

Tertiary-tertiary 

1/VF3 

0.33333 

T  ertiary-quaternary 

1/VT4 

0.28868 

Quaternary-quaternary 

1/VT4 

0.25000 

With  the  above  definition  of  the  bond  contributions,  one  can  easily  compute  the 
connectivity  indices  for  molecules  of  interest.  When  so  calculated,  connectivity 
indices  are  taken  as  molecular  descriptors  in  a  regression  of  octane  heats  of  for¬ 
mation.  The  associated  coefficient  of  regression  and  standard  error  associated  with 
Eq.  (2)  are  R  =  0.934  and  S  =  0.417.  Hence,  Eq.  (2)  will  reproduce  the  heats  of 
formation  of  octane  isomers,  on  average,  with  the  above  standard  error.  What  is 
unsatisfactory  then  about  such  a  regression,  assuming  that  we  were  justified  to  use 
seven  descriptors  Xi-X7? 

The  above  standard  error  refers  to  the  “good”  part  of  the  regression  analysis:  its 
reproduction  of  the  input  data.  The  unsatisfactory  part  is  reflected  in  the  statistics 
of  the  individual  coefficients  of  the  regression  that  we  list  below: 


Variable 

Coefficient 

Standard  error  of 
the  coefficients 

Const. 

1.41 

94.80 

Xi 

-10.08 

29.01 

x2 

-4.86 

3.94 

X3 

-4.50 

7.29 

X4 

-0.04 

10.52 

x5 

-1.34 

13.06 

X6 

2.01 

15.78 

X7 

5.83 

18.42 

Clearly,  from  the  statistical  point  of  view,  the  regression  equation  is  meaningless, 
i.e.,  the  coefficients  of  the  equation  could  not  have  any  possible  meaning,  since  the 
errors  are  often  several  times  larger  than  the  coefficients  themselves.  A  referee  of  a 
manuscript  in  which  the  above  regression  equation  was  mentioned  lamentably 
stated: 

In  general,  the  results  from  this  kind  of  approach  are  terrible.  Correlation  coefficients 
are  excellent,  computed  standard  deviations  were  very  small,  but  when  the  regression 
equations  were  examined,  it  was  found  that  the  standard  deviations  of  the  regression 
coefficients  were  usually  much  larger  than  the  values  of  the  coefficients  themselves.  It 

was  gradually  recognized  that  such  equations  were  useless  for  predictive  purposes,  and 
statistically  invalid  for  correlating  data. 

I  would  agree  with  much  of  the  factual  statements  of  the  above  quote,  but  strongly 
disagree  with  its  conclusion,  which  I  have  emphasized.  This  article,  in  a  way,  is  a 
response  to  the  incorrect  conclusion  of  the  above  anonymous  critic  of  multiple 
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regression  analysis.  The  critic  apparently  has  not  recognized  the  paradoxical  aspect 
of  the  standard  multiple  regression  analysis.  The  proper  conclusion  of  the  statistical 
behavior  of  the  coefficients  of  the  above  regression  equations  is  that  the  equations 
were  useless  for  (coefficient)  interpretation,  but  their  predictive  purpose  for  cor¬ 
relating  data  is  statistically  valid. 

Use  of  the  equations  and  the  validity  of  the  predictions  made  by  the  equations 
are  not  necessarily  related  to  their  instability.  The  validity  of  the  equations  is  mea¬ 
sured  by  the  standard  errors  Sk  of  the  coefficients;  the  quality  of  the  predictions 
made  by  the  equations  are  determined  by  the  standard  deviation  S.  The  latter  can 
be  statistically  significant  and  acceptable,  whereas  the  former  are  at  the  same  time 
statistically  unacceptable.  The  instability  of  the  equations  means  that  the  coefficients 
of  the  equation  need  not  be  reproducible,  but  the  “predictions”  based  on  such 
equations,  nevertheless,  can  give  statistically  acceptable  results. 

We  will  support  our  claim  in  the  next  section.  Before  that,  let  us  briefly  discuss 
the  conclusions  that  the  anonymous  critic  offered.  First,  if  he /she  would  be  right 
that  would  have  devastating  consequences  on  the  work  of  many  people  who  over 
the  last  several  decades,  if  not  longer,  used  multiple  regression  analysis.  Could  it 
be  that  most  of  the  past  structure-property  and  structure-activity  correlations  were 
a  waste?  That  the  work  of  Corwin  Hansch  and  his  school  [4]  was  all  meaningless? 
I  don’t  think  so. 

The  critic  claims  that  this  apparent  fault  of  the  multiple  regression  “was  gradually 
recognized”  as  to  produce  equations  that  “were  useless  for  .  .  .  correlating  data.” 
Recognized  by  whom?  Published  where? 

What  few  may  have  realized  over  the  years  is  that  widely  different  equations  can 
produce  numerically  similar  results  for  the  prediction  of  properties.  In  other  words, 
the  regression  equations  that  reproduce  molecular  properties  within  a  given  accuracy 
(measured  by  the  standard  deviation  S)  need  neither  to  be  unique  nor  even  closely 
similar.  Because  the  coefficients  of  the  equations  can  be  widely  different,  they  cannot 
possibly  have  a  statistical  significance.  One  may  say  that  the  equations  of  multiple 
regression  are  unstable,  having  elements  of  a  “chaotic”  behavior,  in  that  a  small 
perturbation  of  the  equations,  such  as  obtained  by  introducing  an  additional  variable 
that  only  slightly  reduces  the  standard  error  S ,  may  drastically  change  the  coefficients 
of  the  already-present  variables. 

On  Instability  of  Regression  Equations 

To  illuminate  the  instability  of  the  regression  equations,  we  listed  in  Table  I  the 
regression  equations  obtained  by  a  successive  stepwise  inclusion  of  additional  vari¬ 
ables.  We  will  continue  to  use  the  connectivity  indices  X\-X5  as  descriptors  and 
consider  application  to  octane  isomers,  but  the  outline  and  the  conclusions  are 
more  general.  They  apply  to  correlations  of  other  molecules  and  other  properties 
and,  in  particular,  to  correlations  using  other  descriptors.  The  left  half  of  Table  I 
shows  the  variables  used,  the  coefficient  of  the  regression  equation,  and  the  corre¬ 
sponding  standard  error  for  the  coefficient.  At  the  right  half  of  the  table  for  each 
regression  equation,  we  give  the  coefficient  of  the  regression  R ,  the  standard  error 
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Table  I.  The  regression  equations  (column  two)  and  the  standard  error  of  the  coefficients  of  the  equations 
(column  three)  for  stepwise  inclusion  of  the  descriptors  (the  connectivity  indices  Xk). 


Variables 

Coefficient 

Standard  error 

R 

5 

F 

Const 

75.2462 

3.62 

0.679 

1.699 

41.82 

Xi 

-6.4225 

0.99 

Const 

16.9912 

14.88 

0.930 

0.489 

48.18 

X, 

6.2484 

3.27 

x2 

3.7949 

0.95 

Const 

37.410 

50.32 

0.931 

0.503 

30.43 

x, 

1.9202 

10.70 

X2 

2.554 

3.07 

X3 

-0.3829 

0.90 

Const 

75.6936 

69.53 

0.935 

0.509 

22.42 

Xi 

-5.918 

14.54 

x2 

0.1327 

4.32 

x3 

-1.2092 

1.37 

x4 

-0.6376 

-0.79 

Const 

19.2243 

1.15 

0.965 

0.391 

32.46 

x, 

3.8336 

11.57 

x2 

4.2371 

3.56 

x3 

1.5555 

1.36 

x4 

1.8334 

0.99 

X5 

3.634 

1.15 

Const 

23.3773 

57.61 

0.9667 

0.398 

25.14 

X, 

1.2413 

12.30 

x2 

4.5566 

3.65 

x3 

2.8204 

2.20 

X4 

3.5232 

2.49 

X5 

5.6669 

2.98 

X6 

3.2272 

4.20 

Const 

-0.4417 

94.37 

0.9670 

0.416 

20.60 

x. 

9.7097 

28.88 

X2 

4.8577 

3.92 

X3 

0.5693 

7.25 

x4 

0.2000 

10.48 

X5 

1.5244 

13.00 

x6 

-1.8233 

15.71 

X7 

-6.0037 

18.34 

R  is  the  coefficient  of  the  regression;  S  is  the  standard  error  of  prediction;  and  F  is  the  Fisher  ratio. 


S,  and  the  Fisher  ratio  F.  Since  all  the  molecules  considered  are  of  a  same  size, 
the  dominant  dependence  of  the  property  on  the  molecular  size  is  constant  and 
the  three  parameters  R ,  S,  and  F  all  represent  equally  well  the  quality  of  the  regres¬ 
sion.  We  will  follow  their  variations  as  the  number  of  descriptors  are  increased. 

The  linear  correlation,  based  on  a  single  descriptor, 
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P=  — 6.4225Xj  +  75.2462,  (3) 

the  first  equation  in  Table  I,  does  not  yield  a  satisfactory  result:  The  regression 
coefficients  are  too  low,  reaching  barely  0.679.  Hence,  the  equation  explains  the 
variance  in  less  than  half  of  the  data  points  (the  coefficient  of  determination  R2  = 
0.461 ).  When  we  combine  the  connectivity  indices  Xi  and  X2,  already  we  obtain 
a  relatively  satisfactory  correlation: 

P  =  6.2484Xi  +  3.7949x2  +  16.9912.  (4) 

The  standard  error  is  now  reduced  by  more  than  three  times.  The  coefficient  of  the 
regression  is  R  =  0.930  and  the  Fisher  F  ratio  is  significantly  increased.  If  one  stops 
at  this  stage  in  the  structure-property  analysis  and  tries  to  interpret  the  results,  i.e., 
tries  to  identify  structural  components  that  can  explain  most  of  variations  in  the 
data,  one  already  faces  serious  difficulty:  The  two  regression  equations,  one  based 
on  Xi  and  the  other  equation  using  two  descriptors  X{  and  X2,  bear  no  resemblance. 
Consequently,  contributions  of  the  individual  descriptors  cannot  be  determined 
unambiguously. 

The  anonymous  criticism:  “The  results  form  this  kind  of  approach  are  terrible,” 
does  not  apply  to  the  first  equation.  That  equation  was  unsatisfactory,  i.e.,  not 
being  good  enough  from  a  predictive  point  of  view  (large  S),  because  the  descriptor 
Xi  did  not  account  for  the  major  part  of  the  isomeric  variations.  The  equation  was 
associated  with,  relatively  speaking,  a  small  standard  error.  When  the  second  de¬ 
scriptor  was  introduced,  we  observed  a  dramatic  drop  in  the  value  of  the  constant 
term,  from  75.24  to  16.99.  The  coefficient  of  the  initial  variable  Xi  also  shows  a 
“wild”  behavior.  It  shows  a  similar  magnitude  but  has  changed  its  sign:  from  -6.42, 
it  became  6.24.  The  standard  error  for  the  constant  term  has  jumped  from  about 
5  to  50%  and  for  the  coefficient  of  X\  from  about  15  to  50%.  The  standard  error 
for  the  coefficient  of  the  new  term,  X2,  is,  however,  respectable.  In  contrast,  the 
standard  error  for  “prediction”  of  the  property  shows  a  significant  decrease.  It 
dropped  from  1.70  to  0.49. 

If  we  include  the  additional  descriptors  X3  and  X4,  the  new  variables  did  not 
reduce  the  standard  error,  which  remained  more  or  less  constant.  The  corresponding 
coefficients  of  the  regression  slightly  increased  and  the  F  ratio  decreased,  suggesting 
limitations  of  the  two  descriptors,  X3  and  X4 .  The  additional  variables  X3  and  X4 
did  not  introduce  significant  improvement  in  the  multiple  regression  analysis,  be¬ 
yond  the  description  already  arrived  at  using  X!  and  X2.  However,  we  are  here 
interested  in  the  mathematics  behind  the  regression  equations,  so  we  will  continue 
with  the  stepwise  inclusion  of  molecular  descriptors  in  order  to  see  how  extension 
of  the  regression  influences  the  stability  of  the  regression  equations. 

Consider  the  values  for  the  constant  term  in  subsequent  equations:  75.24;  16.99; 
37.41;  75.69;  19.22.  So  what  is  the  constant  term  in  linear  regression  of  the  heats 
of  formation  of  octanes?  The  same  can  be  asked  about  the  contribution  of  Xi, 
which  varies  as  —6.42;  +6.25;  +1 .92;  —5.92,  +3.83.  The  variation  of  the  individual 
contributions  show  that  something  strange  is  accompanying  the  multiple  regression 
analysis. 
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The  instability  that  we  observe  is  a  consequence  of  very  strong  mutual  intercor¬ 
relation  of  the  connectivity  indices,  the  descriptors  Xj-X5.  Each  time  that  we  in¬ 
troduce  a  new  descriptor,  since  it  correlates  with  the  descriptors  already  used,  it 
modifies  their  contributions  and  changes  their  magnitudes,  which  have  hitherto 
characterized  the  roles  of  the  previous  variables. 

Let  us  now  focus  attention  on  the  last  coefficient  in  each  of  the  equations  of 
Table  I:  the  coefficient  describing  the  role  of  the  last  variable  of  the  stepwise  regres¬ 
sion.  We  claim  that  this  coefficient  measures  the  true  role  of  that  descriptor  in  the 
regression.  We  indicated  those  coefficients  in  Table  I  by  bold  face.  They  are  repro¬ 
duced  here,  accompanied  with  the  statistical  parameters  for  the  corresponding  step¬ 
wise  equation: 


Added 

variable 

Coefficient 

R 

F 

New  label 

Const 

75.2462 

Xi 

-6.4225 

0.679 

1.699 

41.8 

n, 

x2 

3.7949 

0.930 

0.489 

48.2 

122 

X3 

-0.3829 

0.931 

0.503 

30.4 

123 

X4 

-0.6376 

0.935 

0.509 

22.4 

J24 

X5 

3.634 

0.965 

0.391 

32.4 

Q5 

Furthermore,  we  claim  that  the  above  is  the  correct  regression  equation  that, 
first,  shows  a  numerical  stability  and,  consequently,  second,  allows  one  to  interpret 
the  relative  roles  of  the  associated  variables.  The  new  variables  were  designated  as 
Q*  and  represent  the  corresponding  orthogonal  components  of  the  initial  variable 
X&.  The  claim  follows  our  previous  work  on  orthogonalization  of  molecular  de¬ 
scriptors  [5,6].  The  new  variables  Q*  describe  those  parts  of  the  stepwise-introduced 
descriptor  that  do  not  correlate  with  any  previously  used  descriptors.  The  construc¬ 
tion  of  such  descriptors  is  based  on  residuals  of  regressions  with  the  descriptors 
already  used,  which,  in  turn,  are  based  on  residuals  of  their  intercorrelations,  and 
so  on. 

Observe  that  the  R  and  S  values  listed  above  are  precisely  the  same  R  and  S 
values  listed  in  Table  I,  yet  the  approach  of  Table  I  has  been  attacked  by  the 
anonymous  critic  as  “terrible”  and  “useless.”  However,  the  predictions  based  on 
the  equations  of  Table  I  are  as  good  as  the  predictions  based  on  the  corresponding 
truncated  equation  listed.  Hence,  since  the  predictions  based  on  the  above  equations 
are  valid  (vide  infra),  and  they  do  not  differ  from  the  predictions  based  on  the 
equations  of  Table  I,  then  one  concludes  that  “unstable”  equations  can  equally 
well  give  a  prediction  of  properties. 

What  Table  I  does  not  permit  is  for  one  to  interpret  the  contributions  of  individual 
variables.  Hence,  in  the  case  of  three  variables,  we  can  use  either 


P  =  37.410  +  1.0202Xi  +  2.554X2  -  0.3829x3 
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or 

P  =  75.2462  -  6.4225ft!  +  3.7949ft2  -  0.3829ft3.  (6) 

Both  equations  will  equally  well  reproduce  the  input  data.  Observe,  as  we  have 
already  said,  that  the  last  coefficients  in  such  comparative  equations  will  always  be 
same,  since  they  describe  the  contribution  of  the  new  variable. 

If  we  decide  that  only  the  first  two  variables  are  sufficient  to  represent  the  regres¬ 
sions,  we  have  either 

P  =  16.9912  +  6.2484Xi  +  3.7949X2 


or 


P  =  75.2462  -  6.4225ft!  +  3.7949ft2.  (7) 

The  first  equation  had  to  be  constructed,  whereas  in  the  case  of  orthogonalized 
descriptors,  we  can  truncate  the  equation  in  (6)  and  immediately  write  down  the 
new  equation.  Because  the  truncation  process  does  not  change  the  coefficients  if 
the  variables  are  orthogonal,  we  can  attribute  to  the  first  connectivity  index  fti  or 
X!  the  contribution  -6.4225,  and  to  the  second  variable  ft2,  which  is  that  part  of 
X2,  which  does  not  parallel  X,,  the  value  +3.79,  etc.  Moreover,  if  we  want  to  use 
three  descriptors,  we  could  select  fti,  ft2,  and  ft5  as  the  best  among  the  set  of  the 
considered  variables  and  eliminate  ft3  and  ft4  from  the  last  equation  in  Table  II  to 
obtain 


P  =  75.2462  -  6.4225ft,  +  3.7949ft2  +  3.634ft5.  (8) 

Statistical  Analysis  of  the  Orthogonal  Equations 

The  cause  of  the  instability  of  the  regression  equation  is  interdependence  among 
the  variables.  Those  who  did  not  experience  the  difficulties  of  instability  of  the 
regression  equations  were  fortunate  that  their  problems  did  not  call  for  variables 
that  were  strongly  intercorrelated.  However,  in  structure-property  and  structure- 
activity  studies,  as  a  rule,  one  has  not  only  intercorrelated  variables  but  also  often 
very  strongly  intercorrelated  variables — the  case  usually  referred  to  as  collinearity. 

We  will  now  substantiate  our  claim  on  the  numerical  stability  of  the  regression 
equations  based  on  orthogonalized  descriptors  and  justify  the  ad  hoc  construction 
of  the  “stable  equations”  from  the  coefficients  of  stepwise  regressions.  In  Table  II, 
we  present  the  information  on  the  standard  errors  for  the  coefficients  of  the  regression 
equations  based  on  orthogonal  descriptors  rather  than  on  “unrefined,”  intercor¬ 
related,  connectivity  indices  Xj-X5.  Table  II  should  be  compared  to  Table  I:  They 
both  list  sets  of  regression  equations  by  giving  the  numerical  values  for  the  coefficients 
of  all  variables  and  the  corresponding  standard  errors.  Since  the  overall  standard 
error  of  prediction  S  and  the  regression  coefficients  R  are  the  same,  whether  de¬ 
scriptors  are  orthogonal  or  not,  the  values  of  R  and  S  are  not  repeated  in  Table  II. 
Neither  is  the  Fisher  F  ratio  showed  as  it  also  does  not  change,  despite  the  numerical 
stability  of  the  equations. 
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Table  II.  The  regression  equations  and  the  standard  error  of  the 
coefficients  of  the  equations  for  stepwise  inclusion  of  orthogonalized 
connectivity  indices  ft*. 


Variable 

Coefficients 

Standard  error 

F-ratio 

Const 

75.2462 

3.62 

41.82 

-6.4224 

0.99 

Const 

75.2454 

2.61 

48.18 

ft. 

-6.4222 

0.72 

fi2 

3.7949 

0.95 

Const 

75.2456 

2.69 

30.43 

ft, 

-6.4222 

0.74 

^2 

3.7949 

0.98 

-0.3833 

0.90 

Const 

75.2463 

2.72 

22.43 

ft, 

-6.4224 

0.75 

ft2 

3.7949 

0.99 

ft3 

-0.3833 

0.91 

ft4 

-0.6378 

0.79 

Const 

75.2467 

2.09 

32.45 

ft, 

-6.4225 

0.57 

ft2 

-3.7945 

0.76 

ft3 

-0.3836 

0.70 

-0.6380 

0.60 

n5 

3.6317 

1.15 

F-ratios  (as  well  as  R  and  S )  are  the  same  as  in  Table  I. 


First  to  observe  in  Table  II  is  that  the  standard  errors  of  the  coefficients  are 
smaller,  often  much  smaller,  than  the  corresponding  coefficients.  Hence,  statistically 
speaking,  the  coefficients  of  the  regression  equations  are  significant.  The  equations 
of  Table  II  fully  negate  the  concerns  of  the  anonymous  critic  and  those  who  “grad¬ 
ually  recognized  that  such  equations  were  useless.” 

There  is  yet  another  very  interesting  and  important  result  that  can  be  seen  from 
Table  II.  The  standard  error  of  a  same  coefficient  in  different  equations  decreases 
as  one  introduces  additional  descriptors.  This  is  just  the  opposite  of  what  we  have 
seen  in  Table  I,  when  new  nonorthogonal  descriptors  caused  a  dramatic  increase 
in  the  standard  errors  for  the  coefficients  of  the  hitherto  considered  equations.  For 
example,  when  using  orthogonal  descriptors,  the  initial  standard  error  for  the  con¬ 
stant  term  75.2462  is  3.6242  (about  4.8%).  After  we  added  new  (orthogonal)  de¬ 
scriptors,  it  decreased  gradually  to  2.61,  2.68,  2.71,  and,  finally,  2.09  (i.e.,  2.8%). 
If  one  looks  carefully  at  the  entries  of  Table  II,  one  can  see  that  the  decrease  in  the 
standard  error  for  the  coefficients  was  not  uniform.  The  increase  in  the  standard 
error  (upon  introducing  and  ft4)  is  also  reflected  in  the  small  decrease  of  the  F 
ratio.  This  suggests  a  worsening  of  the  quality  of  the  regression  from  the  statistical 


224 


RANDIC 


Table  III.  The  correlation  matrix  for  the  connectivity  indices  (upper  part)  and  orthogonalized 

connectivity  indices  (lower  part). 


Hf 

X, 

x2 

x3 

X4 

X5 

Hf 

1.000 

-0.500 

0.912 

-0.207 

-0.440 

-0.134 

Xi 

1.000 

-0.976 

-0.185 

0.527 

0.577 

x2 

1.000 

-0.023 

-0.521 

-0.417 

x3 

1.000 

-0.452 

-0.620 

x4 

1.000 

0.125 

X5 

1.000 

Hf 

Hi 

fl2 

123 

Hf 

1.000 

-0.850 

0.337 

-0.041 

-0.080 

0.240 

ttl 

1.000 

0.000 

0.000 

0.000 

0.000 

02 

1.000 

0.000 

0.000 

0.000 

Q3 

1.000 

0.000 

0.000 

t24 

1.000 

0.000 

^5 

1.000 

viewpoint.  This  variation  of  the  standard  error  of  the  coefficients  can  assist  one  in 
selecting  optimal  descriptors.  Based  on  these  considerations,  we  can  suggest  as 
optimal  the  regression  equation 

P  =  75.2467  -  6.4225^  +  3.7945fi2  +  3.631705.  (9) 

Due  to  the  iterating  nature  of  the  orthogonalization  algorithm,  one  can  have  small 
oscillations  of  individual  coefficients  (see  Table  II).  These  are  due  to  accumulation 
of  the  rounding  errors,  and  with  increased  precision  (double  or  even  higher),  these 
fluctuations  are  eliminated.1- 


Concluding  Remarks 

The  root  of  the  instability  of  the  multiple  regression  equation  has  been  a  strong 
interdependence  of  descriptors.  The  intercorrelation  of  the  descriptors  used  is  il¬ 
lustrated  in  the  correlation  matrix  for  the  connectivity  indices  (Table  III).  Observe 
the  strong  correlation  between  Xi  and  X2.  The  index  X3  is  relatively  weakly  correlated 
with  Xi  and  X2 ,  but  it  appears  that  X3  is  not  very  relevant  for  the  molecular  property 
of  octanes,  as  already  discussed.  In  contrast,  the  correlation  matrix  for  the  orthog¬ 
onalized  descriptors  Qk  shows  that  the  new  variables  are  not  intercorrelated  (the 
lower  part  of  Table  III).  The  submatrix  describing  new  intercorrelations  is  the 
identity  matrix. 


t  D.  Plavsic  (Institute  Rudjer  Boskovic,  Zagreb,  Croatia,  1993)  carried  orthogonalization  for  the  con¬ 
nectivity  indices  using  20  digit  arithmetics  and  obtained  complete  agreement  for  the  coefficients  on  20 
digits. 
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The  purpose  of  this  article  was  to  demonstrate  that  multiple  regression  analysis 
continues  to  be  a  viable  and  important  theoretical  tool  for  data  reduction.  Although 
most  users  have  not  shown  openly  their  concern  about  the  lack  of  the  statistical 
significance  of  regression  equations,  apparently  few,  who  “gradually  recognized 
that  such  equations  were  useless  for  predictive  purposes,”  have  been  confused. 
They  failed  to  distinguish  between  the  statistical  validity  of  the  predictions  and  the 
statistical  validity  of  the  equations. 

To  see  the  difference,  consider  an  analogy,  a  comparison  between  the  Copemican 
planetary  system  and  the  geocentric  system  of  Ptolomy:  If  we  are  interested  in 
predictions  of  solar  or  lunar  eclipses,  both  approaches  would  give  valid  predictions. 
Even  the  actual  amount  of  computation  need  not  be  significantly  different.  But 
one  should  not  try  (with  the  exception  of  the  members  of  Flat  Earth  Society!)  to 
give  physical  significance  to  the  equations  behind  the  epicentric  orbits  of  the  geo¬ 
centric  astronomy.  The  objections  to  regression  analysis  amount  to  the  claim  that 
the  geocentric  system  (in  astronomy)  is  useless.  But  it  was  useful.  It  served  people 
well  before  adoption  of  the  Copernicus  heliocentric  system.  The  difference  between 
the  two  is  in  substance,  or  as  Kepler  replied  when  confronted  with  objections 
to  his  theory:  “It  is  simpler,  even  if  it  cannot  explain  everything  that  Ptolomy 
theory  can.” 

So,  the  orthogonal  descriptors  are  simpler  than  are  descriptors  that  are  intercor- 
related,  not  necessarily  computationally,  but  certainly  conceptually.  Moreover,  even 
if  they  cannot  always  give  simple  explanations,  they  can  give  an  explanation  nev¬ 
ertheless,  whereas  nonorthogonal  descriptors,  at  best,  can  offer  some  ambiguous 
description  for  the  variables  involved. 
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Abstract 

We  present  a  semiempirical  NDDO  procedure,  called  the  fragment  SCF  (fscf)  method,  to  treat  very 
large  molecules.  The  covalent  system  is  partitioned  into  a  relatively  small  subsystem  where  substantial 
chemical  changes  take  place  and  an  environment  that  remains  more-or-less  unperturbed  during  the 
process.  We  expand  the  wavefunction  on  an  atomic  hybrid  basis  and  perform  an  SCF  procedure  for  the 
subsystem  in  the  field  of  the  iteratively  determined  electronic  distribution  of  the  environment.  We  wrote 
a  program  for  the  IBM  RISC/ 560  computer  and  did  several  test  calculations  for  a  variety  of  large  classical 
molecules.  Protonation  energies,  proton  transfer  potential  curves,  rotational  barriers,  atomic  net  charges, 
and  homo  and  lumo  energies,  as  computed  by  the  exact  version  of  the  nddo  method,  are  fairly  well 
reproduced  by  our  approximation.  Using  the  fscf  method,  we  calculated  the  molecular  electrostatic 
potential  on  the  van  der  Waals  envelopes  of  the  specificity  pocket  of  trypsin  and  the  lysine  side  chain  of 
the  bound  substrate  and  visualised  electrostatic  complementarity.  We  developed  a  novel  bulk  phase 
Monte  Carlo  simulation  technique  and  calculated  the  energy  by  the  above  approximation  and  applied 
the  method  to  amorphous  silicon  (a-Si).  Starting  from  a  distorted  tetrahedrally  bonded  random  network 
model  of  a-Si  with  216  atoms,  we  performed  Monte  Carlo  simulations  using  the  fscf  energy  calculation. 
For  the  second  and  subsequent  configurations,  we  exploited  the  feature  of  the  Metropolis-Teller  algorithm, 
namely,  that,  to  generate  a  new  configuration,  we  displace  only  a  single  atom.  Thus  the  number  of 
integrals  to  be  calculated  drastically  decreases  since  only  those  have  to  be  reevaluated  that  contain  the 
coordinates  of  the  displaced  atom.  After  equilibration  we  obtained  distribution  functions  being  almost 
identical  to  the  one  corresponding  to  the  distortion  free  tetrahedrally  bonded  network.  The  same  technique 
was  applied  to  liquid  chlorosilanes.  We  found  that  Si — Cl  bonds  elongate  by  6  to  16  pm  while  H-Si-Cl 
and  Cl-Si-Cl  angles  change  by  2-4°  as  compared  to  the  gas  phase.  ©  1994  John  Wiley  &  Sons,  Inc. 

Introduction 

Quantum  chemical  calculations  for  small  and  medium-size  organic  molecules 
in  the  ground  state  became  almost  routine  at  the  ab  initio  and  semiempirical  level 
of  approximation  [1,2].  One  of  the  challenges  remained  is  the  treatment  of  very 
large  systems  (proteins,  nucleic  acids,  molecular  liquids,  surfaces,  amorphous  ma- 
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terials,  etc.)  containing  hundreds  to  thousands  of  atoms.  Computer  simulations  for 
these  systems  are  based  in  most  cases  on  empirical  force  fields  [3  ]  that  may  provide 
excellent  results  for  a  given  problem  but  are,  in  general,  not  completely  transferable. 
Accordingly,  comparison  of  results  by  different  softwares  is  difficult.  Calculations 
for  reaction  paths  are  especially  problematic  and  need  special  treatment  in  each 
case.  Another  problem  is  that  open-shell  systems  or  problems  related  to  charge 
transfer  cannot  be  handled. 

The  most  popular  approach  to  the  molecular  orbital  theory  of  very  large  covalent 
systems  is  to  combine  quantum  mechanics,  applied  to  the  subsystem  where  the 
important  chemical  changes  take  place,  with  molecular  mechanics  that  describes 
the  environment  [4-7].  The  major  problem  with  this  method  is  the  definition  of 
the  subsystem-environment  boundary  where  hydrogen  atoms  have  to  be  put  in 
order  to  provide  a  classical  closed-shell  model  of  the  subsystem  for  which  the  quan¬ 
tum  mechanical  calculation  can  be  done.  These  hydrogens  may  provide  spurious 
interaction  energy  terms  with  the  surrounding  atoms  and  thus  lead  to  erroneous 
results.  In  quantum/ classical  methods,  polarization  of  the  environment  by  the  sub¬ 
system  is  treated  by  introducing  empirical  parameters.  This  may  yield  very  good 
agreement  with  experiment,  e.g.,  for  proteins,  but  empirical  polarizabilities  are 
available  only  for  a  limited  class  of  atoms.  Therefore,  some  interesting  systems 
(e.g.,  silicates,  zeolites,  nonaqueous  solvents)  provide  special  problems.  Reaction 
field  theories  are  also  popular;  but,  in  some  cases,  the  solvent  model  is  oversimplified, 
or  we  have  a  nonlinear  Schrodinger  equation  to  be  solved  and  may  face  compu¬ 
tational  difficulties  [8-11]. 

In  order  to  provide  a  general  and  consistent  solution  to  the  above  problems,  we 
have  been  working  on  our  fragment  self-consistent  field  (fscf)  method  for  more 
than  a  decade  [12-15].  Our  philosophy,  similar  to  that  outlined  in  Ref.  [4],  is  to 
partition  the  whole  covalent  system  to  be  treated  into  a  central  part  (subsystem) 
where  important  changes  (chemical  reaction,  conformational  change,  excitation, 
ionization,  etc.)  take  place  and  environment  that  has  only  a  secondary  effect  on 
this  localised  event.  This  allows  restriction  of  the  sophisticated  SCF  treatment  to 
the  subsystem  and  introduction  of  more  and  more  approximations  with  increasing 
distance  from  it.  Such  a  model  should  account  for  charge  transfer  inside  the  sub¬ 
system,  polarization  between  subsystem  and  close  surroundings,  and  the  electrostatic 
effect  of  the  far-lying  environment  on  the  subsystem  and  close  surroundings,  re¬ 
spectively. 

Derivation  of  a  molecular  orbital  theory  on  the  basis  of  the  above  model  is 
straightforward  if  we  deal  with  minimum  basis  sets.  In  the  case  of  larger  than 
minimal  basis  sets,  there  is  an  ambiguity  in  the  subsystem-environment  partition, 
which  may  lead  to  imbalance  and  spurious  results.  The  success  of  the  semiempirical 
neglect  of  diatomic  differential  overlap  (nddo)  methods  in  the  treatment  of  a  wide 
variety  of  molecular  systems  [2]  tempted  us  to  apply  the  FSCF  model  at  the  nddo 
level  both  in  the  AM  1  [16,17]  and  PM3  [  1 4, 1 8  ]  parameterizations.  In  the  following 
we  outline  the  method  then  we  report  on  applications  to  various  large  covalent 
systems:  proteins,  amorphous  materials,  and  liquids. 
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Method 


Basis  Set 

The  same  way  as  we  may  build  up  molecules  from  more  or  less  transferable 
chemical  bonds,  we  expand  the  wavefunction  on  the  basis  of  strictly  localized  mo¬ 
lecular  orbitals  (SLMOs).  For  closed-shell  classical  molecules  we  define  one-center 
lone  pairs,  two-center  <r-,  and  many-center  7r-orbitals  [19-23] 


<t>\*  =  hAl , 

(1) 

=  cAihAi  +  Cb,/ibi  , 

(2) 

Vi  =  2  cmlh%? , 

(3) 

m 


where 


hAi  =  bf  unAs  +  b?x  uT  +  bfy  uT  +  bfz  uT  (4) 

is  a  normalized  atomic  hybrid  orbital  (HYO)  centered  on  atom  A.  hyos  on  the 
same  atom  are  Lowdin-orthogonalized  to  obtain  b,  coefficients  in  Eq.  (4).  uA, 
•  •  •  are  Slater-type  orbitals  with  principal  quantum  number  n.  The  total  wave- 
function  is  the  antisymmetrized  product  of  the  SLMOs  (2Nis  the  number  of  electrons 
considered): 


¥  =  det|0?(l)0f(2)-  •  • <t>M2N -  l)d>U2N)\  .  (5) 

For  the  subsystem  we  expand  the  wavefunction  on  the  basis  HYOs  (2 Ns  is  the 
number  of  electrons  in  the  subsystem) : 

*s  =  det|<Af(lWf(2)-  •  • MWs -  1M(2NS)\  (6) 


with 


it  =  2  2  ciijhAj . 

a  je.  A 


(7) 


Secular  Equations 

In  order  to  consider  polarisation  (inductive)  effects  we  may  optimize  hyo  coef¬ 
ficients  cmi  in  Eqs.  (2)-(3)  to  obtain  (the  zeroth-order  wave  function)  by 
solving  a  set  of  coupled  iXlormXm  secular  equations  for  slmos  of  the  subsystem 
and  close  environment 


F i  Cmi  ^mi  Gi 


(8) 


If  we  assume  that  the  core  Hamiltonian  H  and  the  density  matrix  P  are  block 
diagonal  and  the  differential  overlap  between  hyos  a  and  b  is  zero,  the  Fockian  is 
written  as  follows  (cf.  Ref.  [14]  for  more  details): 
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Faa  =  HTDO  +  i  Paaiaa |  aa)+  2  Pbbi.au  \  bb) 

b  (^a)Fa 

+  2  2  Pcc(aa\cc),  a&ot,  (9a) 

/3(#«)  cep 

Fab  =  HTDO- \Pabiua\bb),  a,b<Ea,  a±b,  (9b) 

Fac  =  0,  tfEa,  cEft  (9c) 

Here  a,  b,  c,  and  d  refer  to  hyos,  HMNDO  and  P  are  the  core  Hamiltonian  in  the 
MNDO  approximation  and  the  density  matrix,  ( ab  \  cd )  denotes  electron  interaction 
integrals  in  the  usual  convention,  and  a  and  fi  stand  for  an  SLMO. 

For  the  subsystem  we  have  the  following  secular  equation:  ( S  and  E  denote  the 
subsystem  and  the  environment) 

Fsa,  =  e,a,  (10) 

with 

Fsab  =  Hab  +  2  Pci£(ab\cd)  -  {(ac\bd)\ ,  (11) 

c,d£S 

Hob  =  HT°°  +22  Fcc( ab | cc)  -  i  2  2  PcAac\bd) .  ( 12) 

«££■  cea  a^E  c,dEia 

The  dimensionality  of  Eq.  ( 10)  is  proportional  to  the  size  of  S;  the  number  of  hyos 
in  E  appears  only  in  the  electron  interaction  terms.  Accordingly,  the  computational 
work  reduces  very  much. 

Monte  Carlo  Simulations 

We  applied  the  Metropolis-Teller  algorithm  to  generate  configurations  [24]  and 
calculated  the  potential  with  the  nddo  fscf  method.  We  applied  periodic  boundary 
conditions  and  performed  all  calculations  corresponding  to  the  “minimal  image 
convention”  [25  ] .  A  new  configuration  was  accepted  or  rejected  as  in  the  conven¬ 
tional  Monte  Carlo  procedure.  For  the  wave  function  of  the  starting  configuration, 
we  solved  the  secular  equations  in  Eqs.  (8)  and  (10).  For  the  second,  third,  and 
subsequent  configurations  we  exploited  the  special  feature  of  the  Metropolis-Teller 
algorithm,  namely,  that,  to  generate  a  new  configuration,  only  a  single  atom  is 
displaced.  Thus,  only  those  integrals  have  to  be  reevaluated  that  contain  the  co¬ 
ordinates  of  the  displaced  atom.  Other  integrals  did  not  change,  and  the  corre¬ 
sponding  energy  term  was  constant.  With  this  simplification  the  computer  time 
was  reduced  by  a  factor  of  20. 

Although  we  had  no  problems  for  amorphous  silicon,  it  was  not  so  easy  to  get 
reliable  statistics  on  liquid  chlorosilanes  by  the  classical  Monte  Carlo  method.  During 
our  simulations  the  acceptance  ratio  decreased  in  some  cases  that  we  had  to  com¬ 
pensate  by  reducing  the  maximal  step  size  for  molecular  translation.  This  was  not 
necessary  for  rotational  and  intramolecular  motions.  Accordingly,  our  statistics  for 
quantities  that  depend  on  intermolecular  distances  were  not  reliable  while  it  was 
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quite  good  for  quantities,  depending  on  intramolecular  and  orientation  variables. 
In  order  to  check  the  reliability  of  our  simulations,  we  averaged  the  distributions 
of  bond  lengths,  bond  angles,  and  intermolecular  orientations  over  each  consecutive 
10  configurations  of  the  latest  100  ones.  A  proof  for  the  appropriate  convergence 
of  the  above  parameters  is  that  we  could  not  detect  any  difference  between  the  pair, 
cosine,  and  orientation  distribution  functions  for  these  averages,  respectively.  Sim¬ 
ulation  temperatures  (K),  densities  (g  cm-3),  and  the  number  of  steps  in  the 
Metropolis-Teller  algorithm  were  as  follows:  SiH3Cl:  160. 15, 1. 145,  30,000;  SiH2Cl2: 
160.15,  1.42,  20,000;  SiHCl3:  273.15,  1.34,  20,000. 

Implementation  to  the  SYBYL  Software 

Since  the  input  for  an  FSCF  calculation  should  contain  precise  and  unequivocal 
information  on  chemical  bonding  in  the  system  studied,  it  is  quite  complicated 
and,  for  large  molecules,  practically  impossible  to  construct.  In  order  to  allow  simpler 
input,  we  wrote  an  interface  to  the  SYBYL  molecular  modeling  package  [26]  that 
uses  the  MOL  and  MOL2  file  formats.  The  file  is  made  of  rows  each  containing 
atom  types,  Cartesian  coordinates,  and  bonding  information  with  connectivity 
numbers.  At  present  we  may  consider  33  atom  types  with  different  coordination 
numbers,  charges  and  geometries  for  the  following  elements:  H,  Li,  C,  N,  O,  F, 
Na,  Al,  Si,  P,  S,  Cl,  K,  Ca,  Br,  and  I.  Additionally,  it  is  possible  to  define  hypothetical 
lone-pair  centers  and  dummy  atoms,  as  well. 

We  had  some  problems  in  the  definition  of  delocalized  7r-bonds  given  in  Eq.  (3 ). 
For  example,  in  SYBYL,  butadiene  appears  as  two  double  bonds  connected  by  a 
single  bond  while  the  fscf  method  implies  a  four-center  47r-electron  delocalized 
system.  This  means  that  all  adjacent  carbon  atoms  with  sp2  hybridization  must  be 
included.  The  implementation  of  an  algorithm  for  the  general  case  is  not  obvious 
because  it  may  lead  to  an  infinite  loop  with  respect  to  the  ring  systems.  We  applied 
a  sieving  method  to  find  all  connected  sp2  atoms  that  must  be  repeated  iteratively 
until  no  new  connected  atoms  can  be  found. 

The  steps  of  the  iterative  sieving  algorithm  are  the  following: 

(a)  Sorting  bonds  of  the  molecule. 

(b)  Select  two  atoms  of  the  first  double  bond  as  part  of  the  7r-system. 

(c)  Go  through  all  higher  order  bonds  and  check  whether  one  of  their  atoms  is 
already  connected  to  the  ones  already  selected. 

(d)  If  the  answer  is  yes  for  an  atom,  select  it. 

(e)  Repeat  the  above  procedure  from  (c)  until  no  new  atom  can  be  found. 

(f)  Repeat  the  above  procedure  from  (b)  with  the  next  bond  until  the  last  bond 
is  reached. 

Triple  bonds  require  a  similar  treatment  after  processing  all  x-bonds. 

We  implemented  the  above  algorithm  successfully;  however,  some  specific  con¬ 
siderations  are  required,  and  SYBYL  atom  types  must  be  completed  by  others. 
Some  examples  are  deprotonated  hydroxy  oxygen,  protonated  aromatic  nitrogen, 
and  the  oxygens  of  the  carboxylate  anion.  These  latter  are  treated  separately  because 


232 


NARAY-SZABO  ET  AL. 


of  the  extra  electron  added  to  the  7r-system  containing  two  atoms  of  the  same  type. 
In  cases  where  the  SYBYL  molecular  modeling  software  considers  an  aromatic 
system  as  nonaromatic  bond  types  must  be  corrected  manually.  An  example  is 
pyrrole:  If  it  is  represented  as  a  set  of  two  double  and  three  single  bonds,  the  nitrogen 
atom  will  have  three  single  bonds  and  a  lone  pair  and  will  be  considered  by  the 
above  procedure  as  nonaromatic.  Another  problem  we  faced  is  that  if  the  type  of 
an  atom  is  changed,  its  geometry  may  have  to  be  changed.  For  example,  if  we 
replace  an  sp2  nitrogen  to  an  sp3,  one  without  changing  the  geometry  around  it 
from  planar  to  tetrahedral,  it  is  not  possible  to  generate  correct  hybrids  subsequently. 


Applications 


Small  Model  Systems 

We  investigated  the  effect  of  the  size  of  the  subsystem  on  the  accuracy  of  some 
calculated  molecular  properties  [14] .  It  was  found  that  the  optimal  choice  of  the  sub¬ 
system  is  a  sphere  of  500  pm  radius.  Deprotonation  energies  of  systems,  like 
CH3(CH2)10COOH,  CH3(CH2)9CHFCOOH,  the  Gly74-Ser75-Ser76-Ser77-Glu- 
( COOH )  7 8-Lys79-De80  fragment  of  a-chymotrypsin,  the  (H20)5-H0H(H20)5,  and 
(H20)5HOH  hydrogen-bonded  chains  (dissociating  protons  are  denoted  by  boldface 
letters)  differ  from  those  calculated  for  the  full  system  by  less  than  7.5  and  4.3  kJ /mol 
on  average.  The  energy  curve  for  the  proton  transfer  from  the  above  heptapeptide 
fragment  to  an  ammonia  molecule  near  the  glutamic  acid  side  chain,  as  obtained  by 
the  FSCF  approximation,  is  almost  identical  to  the  exact  one.  Rotational  barriers  for 
the  CH3(CH2)8CH2— CH2COOH  and  CH3(CH2)8CH2—CH2COO“  molecules  around 
the  indicated  C — C  bond  differ  by  less  than  0.4  kJ/mol  from  the  exact  value.  A  larger 
deviation,  7.2  kJ/mol,  is  observed  for  the  energy  difference  of  the  above  heptapeptide 
fragment  in  two  different  conformations  that,  however,  reduces  to  1.6  kJ/mol  if  the 
subsystem  size  increases  to  600  pm.  Atomic  charges  for  the  CH3(CH2),0COOH  and 
(H20)6  molecules  differ  by  less  than  1  millielectrons.  Even  HOMO  and  lumo  energies, 
which  are  considered  as  nonlocal  properties,  are  reproduced  for  the  CH3(CH2)10COOH 
molecule  within  an  accuracy  of  5  kJ/mol. 

Electrostatic  Recognition  in  Trypsin 

As  an  application  of  the  FSCF  approximation,  we  calculated  the  molecular  elec¬ 
trostatic  potential  on  the  van  der  Waals  surface  of  the  specificity  pocket  of  trypsin 
as  well  as  the  accommodated  Lys  side  chain  of  the  bound  substrate.  Coordinates 
were  taken  from  Protein  Data  Bank  for  the  trypsin-bovine  pancreatic  trypsin  in¬ 
hibitor  complex  [27] .  We  followed  the  same  procedure  as  in  case  of  the  conventional 
mndo,  AMl,  and  pm3  wave  functions  [28].  We  integrated  the  subroutine  for  the 
calculation  of  molecular  electrostatic  potentials  (mep)  and  fields  (mef)  into  the 
SYBYL  software  [26] .  The  wavefunction,  necessary  for  the  calculation  of  the  mep 
and  mef,  is  provided  by  the  FSCF  procedure  as  described  above;  the  coordinates 
where  these  quantities  have  to  be  calculated  can  be  given  either  by  invoking  a  text 
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editor  or  by  definition  within  SYBYL.  The  latter  method  allows  one  to  perform  a 
calculation  for  a  set  of  points  on  the  molecular  van  der  Waals  surface.  Fragments 
of  the  trypsin-inhibitor  complex  (subsystem,  polarizable,  and  unpolarizable  regions 
with  all  surface  side  chains  un-ionized)  are  displayed  in  Figure  1.  Note  that  in  the 
present  calculation  we  simply  dropped  the  unpolarizable  region  since  its  influence 
is  negligible. 

We  displayed  the  mep  on  the  van  der  Waals  envelope  of  the  Lys  side  chain  of 
the  substrate  and  the  accommodating  envelope  representing  the  specificity  pocket 
of  the  enzyme  in  color  (Fig.  2).  The  electrostatic  complementarity  is  nicely  visu¬ 
alized;  negative  and  positive  regions  around  Lys  fit  onto  those  with  opposite  sign 
on  the  specificity  pocket  envelope. 

Amorphous  Silicon 

We  did  Monte  Carlo  simulations  with  the  nddo  fscf  method  using  the  pm3 
parameterization  [16].  We  selected  a  distorted  version  of  the  Wooten  model  of 
amorphous  silicon  with  216  atoms  and  a  number  density  of  5.005  X  10“7  pm-3  at 
T  =  293.15  K  as  the  starting  configuration  [29].  This  has  only  tetracoordinated 
silicon  atoms,  i.e.,  exclusively  tr-type  slmos  of  Eq.  (2)  were  used  to  construct  the 
zeroth-order  wave  function.  We  distorted  this  starting  configuration  with  a  con- 


Figure  1 .  Computer  model  of  the  trypsin-bovine  pancreatic  trypsin  inhibitor  complex 
[27].  Yellow:  Lys- 15  side  chain  of  inhibitor  (subsystem),  red:  polarizable  region,  blue: 

nonpolarizable  region. 
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Figure  2.  Molecular  electrostatic  potential  (V)  on  the  van  der  Waals  envelope  of  Lys-15 
emerging  from  the  enzyme  (left)  and  the  inhibitor  (right).  Color  codes:  V  <  0  blue,  0  < 
V  <  1 50  kj  /mol  green,  V  >  1 50  kj /mol  cyan. 


strained  hard-sphere  Monte  Carlo  procedure,  forcing  all  Si  atoms  to  have  four 
neighbors  within  270  pm  distance  and  Si-Si-Si  angles  to  be  larger  than  95°. 

The  subsystem  radius  was  480  pm  including  second  neighbors  of  the  displaced 
atom.  Reaching  an  equilibrium  after  10,000  successful  Monte  Carlo  steps,  we  av¬ 
eraged  50  configurations  each  of  which  was  generated  after  each  500  steps.  The 
total  energy  was  obtained  from  the  zeroth-order  wave  function.  The  radial  distri¬ 
bution  function  fits  very  well  to  the  original  Wooten  one  (cf.  Ref.  [29])  with  some 
minor  differences  in  the  first  neighbor  region  near  235  pm.  This  may  be  due  to  the 
failure  of  the  pm3  parameterization  to  reproduce  Si-Si  distances  precisely  [2].  It 
is  important  to  notice  that  the  radial  distribution  function  is  continuous  and  smooth 
in  the  460-500  pm  region,  which  means  that  there  are  no  spurious  effects  at  the 
boundary  between  the  subsystem  and  the  environment.  Probably  this  would  not 
be  the  case  if  saturating  dangling  bonds  of  the  subsystem  by  hydrogen  atoms.  The 
calculated  cosine  distribution  function  is  also  in  a  good  agreement  with  the  orig¬ 
inal  one. 

The  fscf  mc  method  allows  one  to  calculate  the  total  energy  of  the  system  treated. 
We  obtained  6762  and  6765  kJ/mol/Si  atom  for  our  model  and  the  Wooten  model, 
respectively.  The  slight  decrease  for  the  latter  is  understood  on  the  basis  that  we 
did  the  calculation  for  room  temperature  while  the  Wooten  model  represents  a 
structure  at  T  —  0  K. 
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Table  I.  Geometry  parameters  for  chlorosilane  monomers  (upper  row)  and  dimers  in  vacuo  (middle 
row)  and  in  the  liquid  state  (lower  row). 


Parameter 

(Si— Cl)  pm 

H-Si-Cl 

<  deg 

Cl-Si-Cl 

<  deg 

Molecule 

Ab  initio 

AMl 

Ab  initio 

AMl 

Ab  initio 

AMl 

SiH3Cl 

217.1 

207.1 

107.3 

109.5 

_ 

— 

218.3 

207.3 

106.9 

109.9 

— 

— 

— 

223 

— 

107 

— 

— 

SiH2Cl2 

214.5 

206.1 

108.1 

109.5 

109.2 

109.6 

215.3 

206.2 

108.0 

109.6 

108.4 

109.4 

— 

214 

— 

109 

— 

106 

SiHCl3 

212.7 

205.0 

109.9 

109.5 

109.0 

109.4 

213.2 

205.1 

110.3 

109.7 

108.6 

109.3 

— 

211 

— 

111 

— 

107 

Liquid  Chlorosilanes 

We  did  FSCF  Monte  Carlo  simulations  for  SiH3Cl,  SiH2Cl2,  and  SiHCl3  in  the 
liquid  phase  in  order  to  study  the  effect  of  condensation  on  the  molecular  geometry 
[30] .  We  wanted  to  separate  geometry  distortions  due  to  simple  hydrogen  bonding 
from  bulk  effects;  therefore,  we  did  calculations  for  various  hydrogen-bonded  dimers 
both  with  ab  initio  and  semiempirical  methods.  Ab  initio  calculations  were  carried 
out  at  the  Hartree-Fock  SCF  level  with  the  GAUSSIAN  92  program  package  [31] . 
The  Dunning-Huzinaga  valence  double-zeta  basis  set  [32]  was  chosen  for  the  va¬ 
lence  electrons,  and  in  the  case  of  Si  and  Cl  the  effective  core  potential  of  Hay  and 
Wadt  were  applied  [  33  ] .  The  distance  of  the  two  atoms  forming  the  hydrogen  bond 
in  the  dimer  were  fixed  at  200  pm,  and  these  atoms  together  with  the  two  silicon 
atoms  were  in  a  collinear  position.  For  the  same  geometry  the  calculations  were 
repeated  also  by  the  am3  parameterization  [17] .  Results  are  shown  in  Table  I.  The 
trends  are  in  all  cases  similar  both  for  the  ab  initio  and  semiempirical  calculations, 
but  the  magnitude  of  the  changes  are  smaller  in  the  latter  case.  Silicon-chlorine 
distances  increase  upon  hydrogen  bonding  in  all  three  molecules  but  much  less  if 
applying  the  amI  method.  However,  the  trend  becomes  quite  pronounced  in  the 
liquid  phase  for  which  bulk  effects  are  responsible.  The  distortion  of  bond  angles 
is  not  that  large;  here  again  bulk  effects  have  more  important  impact.  Further 
details  are  given  in  Ref.  [30]. 
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